Denoising Drug Discovery Data for Improved Absorption, Distribution, Metabolism, Excretion, and Toxicity Property Prediction

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/Denoising_Drug_Discovery_Data_for_Improved_Absorption_Distribution_Metabolism_Excretion_and_Toxicity_Property_Prediction/26508846

下载链接

链接失效反馈

官方服务：

资源简介：

Predicting absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of small molecules is a key task in drug discovery. A major challenge in building better ADMET models is the experimental error inherent in the data. Furthermore, ADMET predictors are typically regression tasks due to the continuous nature of the data, which makes it difficult to apply existing denoising methods from other domains as they largely focus on classification tasks. Here, we develop denoising schemes based on deep learning to address this. We find that the training error (TE) can be used to identify the noise in regression tasks while ensemble-based and forgotten event-based metrics fail to detect the noise. The most significant performance increase occurs when the original model is finetuned with the denoised data using TE as the noise detection metric. Our method has the ability to improve models with medium noise and does not degrade the performance of models with noise outside this range (low noise and high noise regimes). To our knowledge, our denoising scheme is the first to improve model performance for ADMET data and has implications for improving models for experimental assay data in general.

预测小分子的吸收、分布、代谢、排泄与毒性（ADMET）性质，是药物发现领域的核心任务之一。构建更优质的ADMET模型所面临的一大关键挑战，源于数据中固有的实验误差。此外，由于数据具有连续属性，ADMET预测任务通常属于回归任务，这使得难以直接套用其他领域现有的去噪方法——这类方法大多聚焦于分类任务。本研究提出了基于深度学习的去噪方案以解决该问题。我们发现，训练误差（training error，TE）可用于识别回归任务中的噪声，而基于集成的方法与基于遗忘事件的指标则无法检测到此类噪声。当以TE作为噪声检测指标、使用去噪后的数据对原始模型进行微调时，模型性能可获得最显著的提升。本方法可有效提升中等噪声水平下的模型性能，且不会对该范围之外（低噪声与高噪声场景）的模型性能造成负面影响。据我们所知，本去噪方案是首个可提升ADMET数据模型性能的方法，其研究成果亦可对通用实验检测数据的模型优化提供借鉴。

创建时间：

2024-08-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集