Extreme Gradient Boosting as a Method for Quantitative Structure–Activity Relationships

NIAID Data Ecosystem2026-03-09 收录

下载链接：

https://figshare.com/articles/dataset/Extreme_Gradient_Boosting_as_a_Method_for_Quantitative_Structure_Activity_Relationships/4312529

下载链接

链接失效反馈

官方服务：

资源简介：

In the pharmaceutical industry it is common to generate many QSAR models from training sets containing a large number of molecules and a large number of descriptors. The best QSAR methods are those that can generate the most accurate predictions but that are not overly expensive computationally. In this paper we compare eXtreme Gradient Boosting (XGBoost) to random forest and single-task deep neural nets on 30 in-house data sets. While XGBoost has many adjustable parameters, we can define a set of standard parameters at which XGBoost makes predictions, on the average, better than those of random forest and almost as good as those of deep neural nets. The biggest strength of XGBoost is its speed. Whereas efficient use of random forest requires generating each tree in parallel on a cluster, and deep neural nets are usually run on GPUs, XGBoost can be run on a single CPU in less than a third of the wall-clock time of either of the other methods.

在制药行业中，从包含大量分子与大量描述符的训练集生成众多定量构效关系（Quantitative Structure-Activity Relationship, QSAR）模型属于常规操作。性能最优的QSAR方法需兼具高精度预测能力与可接受的计算成本。本文基于30个内部数据集，对比了极限梯度提升（eXtreme Gradient Boosting, XGBoost）、随机森林与单任务深度神经网络的模型性能。尽管XGBoost存在大量可调参数，但我们可通过设定一组标准参数，使XGBoost的平均预测精度优于随机森林，且与深度神经网络的表现不相上下。XGBoost的最大优势在于运行速度：若要高效运行随机森林，需在集群环境下并行生成每一棵决策树；而深度神经网络通常需依托图形处理器（Graphics Processing Unit, GPU）运行；相较之下，XGBoost仅需单中央处理器（Central Processing Unit, CPU）即可运行，其挂钟耗时仅为另外两种方法的三分之一不到。

创建时间：

2016-12-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集