Binary Classification of Aqueous Solubility Using Support Vector Machines with Reduction and Recombination Feature Selection
收藏Figshare2015-12-16 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/Binary_Classification_of_Aqueous_Solubility_Using_Support_Vector_Machines_with_Reduction_and_Recombination_Feature_Selection/2015112
下载链接
链接失效反馈官方服务:
资源简介:
Aqueous solubility is recognized as a critical parameter in both the early- and late-stage drug discovery. Therefore, in silico modeling of solubility has attracted extensive interests in recent years. Most previous studies have been limited in using relatively small data sets with limited diversity, which in turn limits the predictability of derived models. In this work, we present a support vector machines model for the binary classification of solubility by taking advantage of the largest known public data set that contains over 46 000 compounds with experimental solubility. Our model was optimized in combination with a reduction and recombination feature selection strategy. The best model demonstrated robust performance in both cross-validation and prediction of two independent test sets, indicating it could be a practical tool to select soluble compounds for screening, purchasing, and synthesizing. Moreover, our work may be used for comparative evaluation of solubility classification studies ascribe to the use of completely public resources.
水溶解度(aqueous solubility)被公认为药物研发早期与晚期阶段均至关重要的核心参数。因此,近年来针对溶解度的虚拟建模研究受到了广泛关注。既往多数研究均受限于使用规模较小、多样性不足的数据集,这也制约了所构建模型的预测性能。本研究依托目前已知规模最大的公开数据集(包含逾46000种带有实验溶解度数据的化合物),构建了一款用于溶解度二分类的支持向量机(support vector machines)模型。本模型结合归约重组式特征选择策略完成优化。最优模型在交叉验证(cross-validation)及两个独立测试集的预测任务中均展现出稳健的性能,表明该模型可作为实用工具,用于可溶性化合物的筛选、采购与合成。此外,由于本研究完全采用公开资源,其结果可用于溶解度分类研究的对比评估。
创建时间:
2015-12-16



