Three Useful Dimensions for Domain Applicability in QSAR Models Using Random Forest

NIAID Data Ecosystem2026-03-07 收录

下载链接：

https://figshare.com/articles/dataset/Three_Useful_Dimensions_for_Domain_Applicability_in_QSAR_Models_Using_Random_Forest/2537326

下载链接

链接失效反馈

官方服务：

资源简介：

One popular metric for estimating the accuracy of prospective quantitative structure–activity relationship (QSAR) predictions is based on the similarity of the compound being predicted to compounds in the training set from which the QSAR model was built. More recent work in the field has indicated that other parameters might be equally or more important than similarity. Here we make use of two additional parameters: the variation of prediction among random forest trees (less variation among trees indicates more accurate prediction) and the prediction itself (certain ranges of activity are intrinsically easier to predict than others). The accuracy of prediction for a QSAR model, as measured by the root-mean-square error, can be estimated by cross-validation on the training set at the time of model-building and stored as a three-dimensional array of bins. This is an obvious extension of the one-dimensional array of bins we previously proposed for similarity to the training set [Sheridan et al. J. Chem. Inf. Comput. Sci. 2004, 44, 1912–1928]. We show that using these three parameters simultaneously adds much more discrimination in prediction accuracy than any single parameter. This approach can be applied to any QSAR method that produces an ensemble of models. We also show that the root-mean-square errors produced by cross-validation are predictive of root-mean-square errors of compounds tested after the model was built.

创建时间：

2012-03-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集