Data Scaling and Generalization Insights for Medicinal Chemistry Deep Learning Models
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Data_Scaling_and_Generalization_Insights_for_Medicinal_Chemistry_Deep_Learning_Models/29211992
下载链接
链接失效反馈官方服务:
资源简介:
Predictive models hold considerable promise in enabling
the faster
discovery of safer, more efficacious therapeutics. To better understand
and improve the performance of small-molecule predictive models for
drug discovery, we conduct multiple experiments with deep learning
and traditional machine learning approaches, leveraging our large
internal data sets as well as publicly available data sets. The experiments
include assessing performance on random, temporal, and reverse-temporal
data ablation tasks as well as tasks testing model extrapolation to
different property spaces. We identify factors that contribute to
the higher performance of predictive models built using graph neural
networks compared to traditional methods such as XGBoost and random
forest. These insights were successfully used to develop a scaling
relationship that explains 81% of the variance in model performance
across various assays and data regimes. This relationship can be used
to estimate the performance of models for ADMET (absorption, distribution,
metabolism, excretion, and toxicity) end points, as well as for drug
discovery assay data more broadly. The findings offer guidance for
further improving model performance in drug discovery.
创建时间:
2025-06-02



