Benchmark Dataset for Symmetry-Corrected RMSD Tools (FlashRMSD Study)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15097620
下载链接
链接失效反馈官方服务:
资源简介:
Overview
This dataset accompanies our research on benchmarking RMSD (Root Mean Square Deviation) calculation tools in the context of molecular conformer comparison. It contains curated molecular pose sets specifically designed to evaluate tool performance under realistic and structurally challenging scenarios.
The dataset is derived from two primary sources:
Chemical Component Dictionary (CCD) – 44,901 small molecules commonly used in macromolecular crystallography.
Biologically Interesting molecule Reference Dictionary (BIRD) – 805 biologically relevant non-polymeric molecules.
Each entry includes multiple (at most 9) conformations of the same molecule, structured for use in both pairwise RMSD and cross-RMSD benchmarks. The dataset emphasizes:
Molecules with high structural diversity
Macrocyclic architectures and branching motifs
Challenging symmetric or pseudo-symmetric cases that test the limits of RMSD algorithms
Dataset Structure
The dataset is systematically organized to ensure ease of access and usability across different benchmarking scenarios.
📁 Top-Level Directory Organization
Molecules are grouped by their source repository:
CCD/[MOLECULE_ID]/ – for molecules sourced from the Chemical Component Dictionary
BIRD/[MOLECULE_ID]/ – for molecules sourced from the Biologically Interesting molecule Reference Dictionary
📂 Per-Molecule Directory Structure
Each molecule is stored in its own folder named after its unique identifier. Inside each folder, the following files are provided:
all_poses.sdf – Multi-conformer file in SDF format (all poses in one file)
all_poses.mol2 – Multi-conformer file in MOL2 format
pose_X.sdf – Individual pose file (pose X) in SDF format
pose_X.mol2 – Individual pose file (pose X) in MOL2 format
This structure supports both tools that operate on single-file multi-conformer input and those that require pose-pair file separation.
A results.csv file is included, containing the ground truth cross-RMSD values for each molecule in the dataset. For entries where RMSD values could not be computed, the corresponding fields are left blank.
创建时间:
2025-03-28



