Experimental Error, Kurtosis, Activity Cliffs, and Methodology: What Limits the Predictivity of Quantitative Structure–Activity Relationship Models?
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Experimental_Error_Kurtosis_Activity_Cliffs_and_Methodology_What_Limits_the_Predictivity_of_Quantitative_Structure_Activity_Relationship_Models_/12133212
下载链接
链接失效反馈官方服务:
资源简介:
Given a particular descriptor/method
combination, some quantitative
structure–activity relationship (QSAR) datasets are very predictive
by random-split cross-validation while others are not. Recent literature
in modelability suggests that the limiting issue for predictivity
is in the data, not the QSAR methodology, and the limits are due to
activity cliffs. Here, we investigate, on in-house data, the relative
usefulness of experimental error, distribution of the activities,
and activity cliff metrics in determining how predictive a dataset
is likely to be. We include unmodified in-house datasets, datasets
that should be perfectly predictive based only on the chemical structure,
datasets where the distribution of activities is manipulated, and
datasets that include a known amount of added noise. We find that
activity cliff metrics determine predictivity better than the other
metrics we investigated, whatever the type of dataset, consistent
with the modelability literature. However, such metrics cannot distinguish
real activity cliffs due to large uncertainties in the activities.
We also show that a number of modern QSAR methods, and some alternative
descriptors, are equally bad at predicting the activities of compounds
on activity cliffs, consistent with the assumptions behind “modelability.”
Finally, we relate time-split predictivity with random-split predictivity
and show that different coverages of chemical space are at least as
important as uncertainty in activity and/or activity cliffs in limiting
predictivity.
创建时间:
2020-03-24



