Derify/augmented_canonical_pubchem_13m
收藏Hugging Face2025-09-09 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/Derify/augmented_canonical_pubchem_13m
下载链接
链接失效反馈官方服务:
资源简介:
Augmented Canonical PubChem 10M数据集是从原始的PubChem 10M数据集衍生而来的,使用RDKit (2024.9.4)进行了规范化处理,以保证结构的统一性。为了增强分子多样性,数据集中有33%的条目被随机抽取并使用RDKit的随机SMILES生成功能进行了增强。该数据集总共包含13M个SMILES。
The Augmented Canonical PubChem 10M dataset is derived from the original PubChem 10M and has been canonicalized using RDKit (2024.9.4) to ensure structural consistency. To enhance molecular diversity, 33% of the dataset has been randomly sampled and augmented using RDKits Chem.MolToRandomSmilesVect function for SMILES augmentation. The dataset contains a total of 13M SMILES.
提供机构:
Derify



