five

Benchmark Dataset for Symmetry-Corrected RMSD Tools (FlashRMSD Study)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15097620
下载链接
链接失效反馈
官方服务:
资源简介:
Overview This dataset accompanies our research on benchmarking RMSD (Root Mean Square Deviation) calculation tools in the context of molecular conformer comparison. It contains curated molecular pose sets specifically designed to evaluate tool performance under realistic and structurally challenging scenarios. The dataset is derived from two primary sources: Chemical Component Dictionary (CCD) – 44,901 small molecules commonly used in macromolecular crystallography. Biologically Interesting molecule Reference Dictionary (BIRD) – 805 biologically relevant non-polymeric molecules. Each entry includes multiple (at most 9) conformations of the same molecule, structured for use in both pairwise RMSD and cross-RMSD benchmarks. The dataset emphasizes: Molecules with high structural diversity Macrocyclic architectures and branching motifs Challenging symmetric or pseudo-symmetric cases that test the limits of RMSD algorithms Dataset Structure The dataset is systematically organized to ensure ease of access and usability across different benchmarking scenarios. 📁 Top-Level Directory Organization Molecules are grouped by their source repository: CCD/[MOLECULE_ID]/ – for molecules sourced from the Chemical Component Dictionary BIRD/[MOLECULE_ID]/ – for molecules sourced from the Biologically Interesting molecule Reference Dictionary 📂 Per-Molecule Directory Structure Each molecule is stored in its own folder named after its unique identifier. Inside each folder, the following files are provided: all_poses.sdf – Multi-conformer file in SDF format (all poses in one file) all_poses.mol2 – Multi-conformer file in MOL2 format pose_X.sdf – Individual pose file (pose X) in SDF format pose_X.mol2 – Individual pose file (pose X) in MOL2 format This structure supports both tools that operate on single-file multi-conformer input and those that require pose-pair file separation. A results.csv file is included, containing the ground truth cross-RMSD values for each molecule in the dataset. For entries where RMSD values could not be computed, the corresponding fields are left blank.
创建时间:
2025-03-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作