RosettaCommons/NCI_Open_Compounds

Name: RosettaCommons/NCI_Open_Compounds
Creator: RosettaCommons
Published: 2026-03-05 18:33:08
License: 暂无描述

Hugging Face2026-03-05 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/RosettaCommons/NCI_Open_Compounds

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en tags: - chemistry - biology pretty_name: >- Processed NCI Open Compounds Structures for Docking, Cofold, and Affinity Prediction size_categories: - 100K<n<1M configs: - config_name: default data_files: - split: train path: nci_compounds.tsv delimiter: "\t" --- # Curated NCI Open Compounds dataset A curated set of the NCI Open Compounds with compatible mol2 and pdbqt files safe for cofolding and docking applications ## Quickstart Usage ### Install HuggingFace Datasets package Each subset can be loaded into python using the Huggingface [datasets](https://huggingface.co/docs/datasets/index) library. First, from the command line install the `datasets` library $ pip install datasets Optionally set the cache directory, e.g. $ HF_HOME=${HOME}/.cache/huggingface/ $ export HF_HOME then, from within python load the datasets library >>> import datasets ### Load model datasets To load one of the `NCI_Open_Compounds` model datasets, use `datasets.load_dataset(...)`: >>> dataset_tag = "train" >>> dataset_models = datasets.load_dataset( path = "leebecca/NCI_Open_Compounds", name = f"{dataset_tag}_models", data_dir = f"{dataset_tag}")['train'] and the dataset is loaded as a `datasets.arrow_dataset.Dataset` >>> dataset_models Dataset({ features: [ 'NSC', 'duplicate_idx', 'CID', 'SID', 'CAS', 'entry_id', 'entry_name', 'name', 'formula', 'smiles', 'mw', 'tot_q', 'tot_abs_q', 'chiralities_consistent', 'chiral_flag', 'flags', 'charging_adjusted_penalty', 'ionization_penalty', 'ionization_penalty_charging', 'ionization_penalty_neutral', 'state_penalty', 'energy', 'tautomer_probability', 'input_file', 'structure_evaluation', 'chemistry_notes', 'pka_notes' ], num_rows: 445794 }) ## Dataset Details ### Dataset Description The set contains ligprep output of the minimized 3D structures, expanded to include possible protonation states and tautomers capped at 3 per ligand. - **Acknowledgements:** We kindly acknowledge RosettaCommons ### Dataset Sources https://wiki.nci.nih.gov/spaces/NCIDTPdata/pages/155844992/Chemical+Data ## Uses ### Out-of-Scope Use ### Source Data ## Citation ## Dataset Card Authors Becca Lee (beccalee5@g.ucla.edu)

提供机构：

RosettaCommons

5,000+

优质数据集

54 个

任务类型

进入经典数据集