Derify/augmented_canonical_druglike_QED_Pfizer_15m
收藏Hugging Face2025-09-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Derify/augmented_canonical_druglike_QED_Pfizer_15m
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- feature-extraction
tags:
- chemistry
- smiles
- cheminformatics
pretty_name: Augmented Canonical Druglike QED Pfizer 15M
size_categories:
- 10M<n<100M
---
# Druglike QED Pfizer 15M - Augmented SMILES Dataset
This dataset is derived from the [Druglike molecule datasets for drug discovery](https://zenodo.org/records/7547717) dataset and has been canonicalized using RDKit (2024.9.4) to ensure structural consistency.
To enhance molecular diversity, 33% of the dataset was randomly sampled and augmented using RDKit’s `Chem.MolToRandomSmilesVect` function, following an approach similar to NVIDIA's *molmim* method for SMILES augmentation.
### Dataset Overview:
- **Source:** [](https://doi.org/10.5281/zenodo.7547717)
- **Canonicalization:** RDKit (2024.9.4)
- **Augmentation:** Random SMILES generation for 33% of the original dataset
提供机构:
Derify



