five

kohbanye/crossdocked2020

收藏
Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/kohbanye/crossdocked2020
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc0-1.0 task_categories: - other tags: - drug-discovery - molecular-generation - protein-ligand - structural-biology size_categories: - 1M<n<10M --- # CrossDocked2020 Pre-processed CrossDocked2020 dataset containing raw receptor PDB and ligand SDF.gz files, organized for efficient loading. ## Dataset Summary - **Unique pairs**: 25,092,018 - **Unique receptor PDB files**: 24,533 - **Source types**: cdonly, it0, it2_redocked - **Fold splits**: 3 folds (0, 1, 2) per source type category ## Repository Structure ``` receptors/ Unique receptor PDB files in tar.gz archives ligands/ Ligand SDF.gz files in tar shards (WebDataset-compatible) manifest.parquet Pair index with metadata and fold split info ``` ## Ligand Tar Shard Format Each shard is a tar file containing pairs of files per sample: - `{pair_idx:07d}.sdf.gz` — original ligand SDF.gz (all conformers) - `{pair_idx:07d}.json` — metadata (receptor_path, complex_dir, source_type) ## Manifest Schema | Column | Type | Description | |--------|------|-------------| | pair_idx | uint32 | Global unique pair ID | | complex_dir | string | Complex directory name | | receptor_pdb | string | Receptor PDB filename | | ligand_sdf_gz | string | Ligand SDF.gz filename | | source_type | string | cdonly / it0 / it2_redocked | | shard_idx | uint16 | Ligand shard number | | label | int8 | Types file label (0/1) | | score1 | float32 | Types file score 1 | | score2 | float32 | Types file score 2 | | {cat}_fold{n} | string | "train" / "test" per category and fold | ## Original Source Paul G. Francoeur, Tomohide Masuda, Jocelyn Sunseri, Andrew Jia, Richard B. Iovanisci, Ian Snyder, David R. Koes. *J. Chem. Inf. Model.* 2020, 60(9), p.4200–4215.
提供机构:
kohbanye
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作