five

PVLOO-Based Training Set Selection Improves the External Predictability of QSAR/QSPR Models

收藏
Figshare2017-04-27 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/PV_sub_LOO_sub_-Based_Training_Set_Selection_Improves_the_External_Predictability_of_QSAR_QSPR_Models/4924262
下载链接
链接失效反馈
官方服务:
资源简介:
In QSAR/QSPR modeling, the indispensable way to validate the predictability of a model is to perform its statistical external validation. It is common that a division algorithm should be used to select training sets from chemical compound libraries or collections prior to external validations. In this study, a division method based on the posterior variance of leave-one-out cross-validation (PVLOO) of the Gaussian process (GP) has been developed with the goal of producing more predictive models. Four structurally diverse data sets of good quality are collected from the literature and then redeveloped and validated on the basis of training set selection methods, namely, four kinds of PVLOO-based training set selection methods with three types of covariance functions (squared exponential, rational quadratic, and neural network covariance functions), the Kennard–Stone algorithm, and random division. The root mean squared error (RMSE) of external validation reported for each model serves as a basis for the final comparison. The results of this study indicate that the training sets with higher values of PVLOO have statistically better external predictability than the training sets generated from other division methods discussed here. These findings could be explained by proposing that the PVLOO value of GP could indicate the mechanism diversity of a specific compound in QSAR/QSPR data sets.
创建时间:
2017-04-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作