Molecular Similarity-Based Domain Applicability Metric Efficiently Identifies Out-of-Domain Compounds
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/Molecular_Similarity-Based_Domain_Applicability_Metric_Efficiently_Identifies_Out-of-Domain_Compounds/7359536
下载链接
链接失效反馈官方服务:
资源简介:
Domain
applicability (DA) is a concept introduced to gauge the
reliability of quantitative structure–activity relationship
(QSAR) predictions. A leading DA metric is ensemble variance, which is defined as the variance of predictions by an ensemble
of QSAR models. However, this metric fails to identify large prediction
errors in melting point (MP) data, despite the availability of large
training data sets. In this study, we examined the performance of
this metric on MP data and found that, for most molecules, ensemble
variance increased as their structural similarity to the training
molecules decreased. However, the metric decreased for “out-of-domain”
molecules, i.e., molecules with little to no structural similarity
to the training compounds. This explains why ensemble variance fails
to identify large prediction errors. In contrast, a new molecular
similarity-based DA metric that considers the contributions
of all training molecules in gauging the reliability of a prediction
successfully identified predictions of MP data for which the errors
were large. To validate our results, we used four additional data
sets of diverse molecular properties. We divided each data set into
a training set and a test set at a ratio of approximately 2:1, ensuring
a small fraction of the test compounds are out of the training domain.
We then trained random forest (RF) models on the training data and
made RF predictions for the test set molecules. Results from these
data sets confirm that the new DA metric significantly outperformed
ensemble variance in identifying predictions for out-of-domain compounds.
For within-domain compounds, the two metrics performed similarly,
with ensemble variance marginally but consistently outperforming the
new DA metric. The new DA metric, which does not rely on an ensemble
of QSAR models, can be deployed with any machine-learning method,
including deep neural networks.
创建时间:
2018-11-19



