Druglike molecule datasets for drug discovery

NIAID Data Ecosystem2026-03-14 收录

下载链接：

https://zenodo.org/record/7547716

下载链接

链接失效反馈

官方服务：

资源简介：

Background Trnasformer-based AI models have shown outstanding performance in identifying druggable candidate molecules. In most cases, models are trained on a massive amount of database of molecular information to capture the latent meaning of a given molecule. However, the desirable properties of candidate molecules include the feasibility of synthesizing them, low toxicity, and high druggability. In this study, we injected prior knowledge of the desirable properties of molecules during the training process. Methods Using the PubChem database (100 M), we filtered druglike molecules based on the quantity of drug-likeliness (QED) score and the Pfizer rule. With this dataset of drug-like molecules, we trained both the molecular representation model (chemBERTa) and the molecular generation models (MolGPT). The molecular representation model was evaluated by fine-tuning the results on the MoleculeNet benchmark datasets, and the molecular generation model was evaluated based on the generated samples (10 K). Results Training with druglike molecules enabled the generation of molecules with desirable properties without any conditioning. Although the molecular representation learning model was not remarkable, however, its performance in predicting clinical toxicology exceeded that of conventional molecular representation models. Conclusion By training based on a dataset of druglike molecules, our approach enables molecular representation models to predict clinical toxicity more precisely. Furthermore, it enables the molecule generation model to generate molecules with desirable druglike properties without any conditional generation procedures. ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- import pickle with open("druglike_molecules_QED.pkl", "rb") as f: data = pickle.load(f)

创建时间：

2023-01-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集