Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters

Name: Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters
Creator: Taylor & Francis
Published: 2020-09-04 15:03:47
License: 暂无描述

DataCite Commons2020-09-04 更新2024-07-25 收录

下载链接：

https://tandf.figshare.com/articles/dataset/Consistency_of_QSAR_models_Correct_split_of_training_and_test_sets_ranking_of_models_and_performance_parameters_a_href_FN0001_target_blank_a_/1569694

下载链接

链接失效反馈

官方服务：

资源简介：

Recent implementations of QSAR modelling software provide the user with numerous models and a wealth of information. In this work, we provide some guidance on how one should interpret the results of QSAR modelling, compare and assess the resulting models, and select the best and most consistent ones. Two QSAR datasets are applied as case studies for the comparison of model performance parameters and model selection methods. We demonstrate the capabilities of sum of ranking differences (SRD) in model selection and ranking, and identify the best performance indicators and models. While the exchange of the original training and (external) test sets does not affect the ranking of performance parameters, it provides improved models in certain cases (despite the lower number of molecules in the training set). Performance parameters for external validation are substantially separated from the other merits in SRD analyses, highlighting their value in data fusion.

近年来推出的定量构效关系（Quantitative Structure-Activity Relationship，QSAR）建模软件，可为用户提供大量模型与丰富的信息。本研究针对如何解读定量构效关系建模结果、对比与评估所得模型，并筛选出最优且最具一致性的模型，提供了相应指导。本研究采用两套QSAR数据集作为案例研究，用于对比模型性能参数与模型筛选方法。本研究验证了排序差异总和（sum of ranking differences，SRD）在模型筛选与排序中的应用能力，并确定了最优的性能指标与模型。尽管交换原始训练集与（外部）测试集不会改变性能参数的排序结果，但在部分场景下，该操作仍可获得更优的模型，即便此时训练集的分子数量有所减少。在SRD分析中，用于外部验证的性能参数与其余评估指标存在显著区分，凸显了其在数据融合中的应用价值。

提供机构：

Taylor & Francis

创建时间：

2015-10-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集