PharmaBench: Enhancing ADMET benchmarks with large language models

Figshare2024-04-07 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/PharmaBench_Enhancing_ADMET_benchmarks_with_large_language_models/25559469

下载链接

链接失效反馈

官方服务：

资源简介：

Accurately predicting ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties early in drug development is essential for selecting compounds with optimal pharmacokinetics and minimal toxicity. Existing benchmark sets have limited utility for AI modeling due to small dataset sizes and a lack of representation of compounds. To address this issue, we propose a multi-agent data mining system based on Large Language Models to effectively identify experimental conditions within 14,401 bioassays. This assists in merging entries from different sources, resulting in the creation of PharmaBench. This collection includes eleven datasets and 52,482 entries.

在药物研发早期精准预测ADMET（Absorption, Distribution, Metabolism, Excretion, and Toxicity，吸收、分布、代谢、排泄与毒性）性质，对于筛选具备最优药代动力学特征与最低毒性的化合物至关重要。现有基准数据集因规模偏小且化合物代表性不足，在人工智能建模中的应用价值受限。为解决这一问题，本研究提出一种基于大语言模型（Large Language Model）的多智能体数据挖掘系统，可有效识别14,401项生物测定实验中的实验条件，辅助整合不同来源的数据条目，最终构建出PharmaBench数据集。该数据集集合共包含11个细分数据集与52,482条数据条目。

创建时间：

2024-04-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集