five

Contemporary QSAR Classifiers Compared

收藏
NIAID Data Ecosystem2026-03-06 收录
下载链接:
https://figshare.com/articles/dataset/Contemporary_QSAR_Classifiers_Compared/3031792
下载链接
链接失效反馈
官方服务:
资源简介:
We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rigorous multiple comparison statistical tests, that these techniques can provide consistent improvements in predictive performance over single decision trees. However, within these methods, there is no clearly best-performing algorithm. This motivates a more in-depth investigation into the properties of random forests. We identify a set of parameters for the random forest that provide optimal performance across all the studied data sets. Additionally, the tree ensemble structure of the forest may provide an interpretable model, a considerable advantage over SVMs. We test this possibility and compare it with standard decision tree models.

本研究针对8组数据集与两套分子描述符,开展了多款前沿机器学习工具挖掘药物数据的对比评估,所涉工具包括支持向量机(support vector machines, SVM)以及集成决策树类方法:提升算法(boosting)、装袋法(bagging)与随机森林(random forest)。本研究通过严谨的多重比较统计检验证实,相较于单棵决策树,上述技术可在预测性能上实现稳定提升。然而在这些方法中,并未出现性能显著最优的算法。这一发现推动我们对随机森林的特性展开更深入的研究。本研究确定了一组可使随机森林在所有受试数据集上均达到最优性能的参数组合。此外,随机森林的树集成结构可生成可解释模型,这相较于支持向量机是一项显著优势。本研究针对该可行性开展了测试,并将其与标准决策树模型进行了对比。
创建时间:
2016-02-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作