Rivality index neighbourhood algorithm with density and distances weighted schemes for the building of robust QSAR classification models with high reliable applicability domain
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Rivality_index_neighbourhood_algorithm_with_density_and_distances_weighted_schemes_for_the_building_of_robust_QSAR_classification_models_with_high_reliable_applicability_domain/9752816
下载链接
链接失效反馈官方服务:
资源简介:
The rivality index (RI) is a normalized distance measurement between a molecule and their first nearest neighbours providing a robust prediction of the activity of a molecule based on the known activity of their nearest neighbours. Negative values of the RI describe molecules that would be correctly classified by a statistic algorithm and, vice versa, positive values of this index describe those molecules detected as outliers by the classification algorithms. In this paper, we have described a classification algorithm based on the RI and we have proposed four weighted schemes (kernels) for its calculation based on the measuring of different characteristics of the neighbourhood of molecules for each molecule of the dataset at established values of the threshold of neighbours. The results obtained have demonstrated that the proposed classification algorithm, based on the RI, generates more reliable and robust classification models than many of the more used and well-known machine learning algorithms. These results have been validated and corroborated by using 20 balanced and unbalanced benchmark datasets of different sizes and modelability. The classification models generated provide valuable information about the molecules of the dataset, the applicability domain of the models and the reliability of the predictions.
竞争指数(rivality index, RI)是一种用于衡量分子与其一阶最近邻之间距离的归一化度量方式,可基于最近邻的已知活性对分子的活性实现可靠预测。该指数为负值时,对应可被统计分类算法正确分类的分子;反之,正值则对应被分类算法识别为异常值的分子。本研究提出了一种基于竞争指数的分类算法,并针对数据集内的每个分子,在设定的邻域阈值下,通过度量分子邻域的不同特征,提出了四种用于计算该指数的加权方案(核函数)。实验结果表明,所提出的基于竞争指数的分类算法可构建出比诸多广泛使用的经典机器学习算法更为可靠且稳健的分类模型。上述结果已通过20组涵盖不同规模与可建模性的平衡及非平衡基准数据集完成验证与佐证。所构建的分类模型可为数据集内的分子、模型的适用域以及预测结果的可靠性提供极具价值的参考信息。
创建时间:
2019-08-30



