Derify/augmented_canonical_druglike_QED_43M
收藏Hugging Face2025-09-09 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Derify/augmented_canonical_druglike_QED_43M
下载链接
链接失效反馈官方服务:
资源简介:
Augmented Canonical Druglike QED 36M数据集是从Druglike分子数据集中派生出来的,通过RDKit(2024.9.4版本)进行了结构规范化的处理。为了增强分子的多样性,数据集中有33%的条目通过RDKit的Chem.MolToRandomSmilesVect函数进行了随机SMILES生成增强,该方法类似于NVIDIA的molmim SMILES增强方法。数据集总共包含43M个SMILES。
The Augmented Canonical Druglike QED 36M dataset is derived from the Druglike molecule datasets and has been canonicalized using RDKit (version 2024.9.4) to ensure structural consistency. To enhance molecular diversity, 33% of the dataset has been randomly sampled and augmented with random SMILES generation using RDKits Chem.MolToRandomSmilesVect function, following an approach similar to NVIDIAs molmim method for SMILES augmentation. The dataset contains a total of 43M SMILES.
提供机构:
Derify



