five

SDDF Energy Dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14008356
下载链接
链接失效反馈
官方服务:
资源简介:
This conformational energy dataset, developed as part of the Smart Distributed Data Factory (SDDF) project, contains over 2.17 million molecular conformations based on drug-like molecules sourced from the ENAMINE database. Energies were calculated using DFT with the ωB97x density functional and the 6–31G(d) basis set. The conformations were generated from SMILES using RDKit, MMFF94 optimization, and molecular dynamics (MD) simulations, providing a diverse set of molecular structures and energy states. RDKit Conformations: 535,338 RDKit + MMFF94 Optimized: 1,151,936 MD-Generated: 483,279 This dataset serves as a benchmark for energy prediction models, with training (638,617 examples), validation (134,732 examples), and test subsets (24,890 examples) created using a strict scaffold-based split to ensure no overlap and less than 70% similarity between the training and test sets. Dataset contents: data.tar.gz: contains the conformations in Structured Data File format, grouped into separate folders based on the molecule ID. INDEX.smi: specifies the molecule IDs and their corresponding SMILES. SOURCES.csv: specifies the conformation generation method for each conformation. SDDF_train.tsv, SDDF_validation.tsv, and SDDF_test.tsv specify the molecule IDs and conformations for each subset of the benchmark. A detailed description is provided in the accompanying paper.
创建时间:
2024-10-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作