Three Useful Dimensions for Domain Applicability in QSAR Models Using Random Forest
收藏NIAID Data Ecosystem2026-03-07 收录
下载链接:
https://figshare.com/articles/dataset/Three_Useful_Dimensions_for_Domain_Applicability_in_QSAR_Models_Using_Random_Forest/2537326
下载链接
链接失效反馈官方服务:
资源简介:
One popular metric for estimating the accuracy of prospective
quantitative structure–activity relationship (QSAR) predictions
is based on the similarity of the compound being predicted to compounds
in the training set from which the QSAR model was built. More recent
work in the field has indicated that other parameters might be equally
or more important than similarity. Here we make use of two additional
parameters: the variation of prediction among random forest trees
(less variation among trees indicates more accurate prediction) and
the prediction itself (certain ranges of activity are intrinsically
easier to predict than others). The accuracy of prediction for a QSAR
model, as measured by the root-mean-square error, can be estimated
by cross-validation on the training set at the time of model-building
and stored as a three-dimensional array of bins. This is an obvious
extension of the one-dimensional array of bins we previously proposed
for similarity to the training set [Sheridan et al. J. Chem.
Inf. Comput. Sci. 2004, 44,
1912–1928]. We show that using these three parameters simultaneously
adds much more discrimination in prediction accuracy than any single
parameter. This approach can be applied to any QSAR method that produces
an ensemble of models. We also show that the root-mean-square errors
produced by cross-validation are predictive of root-mean-square errors
of compounds tested after the model was built.
创建时间:
2012-03-26



