Data Scaling and Generalization Insights for Medicinal Chemistry Deep Learning Models

Figshare2025-06-02 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Data_Scaling_and_Generalization_Insights_for_Medicinal_Chemistry_Deep_Learning_Models/29211992

下载链接

链接失效反馈

官方服务：

资源简介：

Predictive models hold considerable promise in enabling the faster discovery of safer, more efficacious therapeutics. To better understand and improve the performance of small-molecule predictive models for drug discovery, we conduct multiple experiments with deep learning and traditional machine learning approaches, leveraging our large internal data sets as well as publicly available data sets. The experiments include assessing performance on random, temporal, and reverse-temporal data ablation tasks as well as tasks testing model extrapolation to different property spaces. We identify factors that contribute to the higher performance of predictive models built using graph neural networks compared to traditional methods such as XGBoost and random forest. These insights were successfully used to develop a scaling relationship that explains 81% of the variance in model performance across various assays and data regimes. This relationship can be used to estimate the performance of models for ADMET (absorption, distribution, metabolism, excretion, and toxicity) end points, as well as for drug discovery assay data more broadly. The findings offer guidance for further improving model performance in drug discovery.

创建时间：

2025-06-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集