Impact of Chemist-In-The-Loop Molecular Representations on Machine Learning Outcomes
收藏Figshare2020-08-10 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Impact_of_Chemist-In-The-Loop_Molecular_Representations_on_Machine_Learning_Outcomes/12824566
下载链接
链接失效反馈官方服务:
资源简介:
The development of molecular descriptors is a central challenge in cheminformatics. Most approaches use algorithms that extract atomic environments or end-to-end machine learning. However, a looming question is that how do these approaches compare with the critical eye of trained chemists. The CAS fingerprint engages expert chemists to curate chemical motifs, which they deem could influence bioactivity. In this paper, we benchmark the CAS fingerprint against commonly used fingerprints using a well-established benchmark set of 88 targets. We show that the CAS fingerprint outperforms most of the commonly used molecular fingerprints. Analysis of the CAS fingerprint reveals that experts tend to select features that are rarely reported in the literature, though not all rare features are selected. Our analysis also shows that the CAS fingerprint provides a different source of information compared to other commonly used fingerprints. These results suggest that anthropomorphic insights do have predictive power and highlight the importance of a chemist-in-the-loop approach in the era of machine learning.
创建时间:
2020-08-10



