five

Benchmark Data for Chemprop

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8174267
下载链接
链接失效反馈
官方服务:
资源简介:
Datasets and splits of the manuscript "Chemprop: Machine Learning Package for Chemical Property Prediction." Train, validation and test splits are located within each folder, as well as additional data necessary for some of the benchmarks. To train Chemprop models, refer to our code repository to obtain ready-to-use scripts to train machine learning models for each of the systems. Available benchmarking systems: `hiv` HIV replication inhibition from MoleculeNet and OGB with scaffold splits `pcba_random` Biological activities from MoleculeNet with random splits  (with missing targets filled in with zeros as provided by MoleculeNet) `pcba_random_nans` Biological activities from MoleculeNet with random splits and data format to match OGB (with missing targets not filled in with zeros) `pcba_scaffold` Biological activities from OGB with scaffold splits `qm9_multitask` DFT calculated properties from MoleculeNet and OGB, trained as a multi-task model `qm9_u0` DFT calculated properties from MoleculeNet and OGB, trained as a single-task model on the target U0 only `qm9_gap` DFT calculated properties from MoleculeNet and OGB, trained as a single-task model on the target gap only `sampl` Water-octanol partition coefficients, used to predict molecules from the SAMPL6, 7 and 9 challenges `atom_bond_137k` Quantum-mechanical atom and bond descriptors `bde` Bond dissociation enthalpies trained as single-task model `bde_charges` Bond dissociation enthalpies trained as multi-task model together with atomic partial charges `charges_eps_4` Partial charges at a dielectric constant of 4 (in protein) `charges_eps_78` Partial charges at a dielectric constant of 78 (in water) `barriers_e2` Reaction barrier heights of E2 reactions `barriers_sn2` Reaction barrier heights of SN2 reactions `barriers_cycloadd` Reaction barrier heights of cycloaddition reactions `barriers_rdb7` Reaction barrier heights in the RDB7 dataset `barriers_rgd1` Reaction barrier heights in the RGD1-CNHO dataset `multi_molecule` UV/Vis peak absorption wavelengths in different solvents `ir` IR Spectra `pcqm4mv2` HOMO-LUMO gaps of the PCQM4Mv2 dataset `uncertainty_ensemble` Uncertainty estimation using an ensemble using the QM9 gap dataset `uncertainty_evidential` Uncertainty estimation using evidential learning using the QM9 gap dataset `uncertainty_mve` Uncertainty estimation using mean-variance estimation using the QM9 gap dataset `timing` Timing benchmark using subsets of QM9 gap Version: This version of the dataset (Version 2) is compatible with all versions of Chemprop (supporting the respective functionality). Version 1 of this dataset is compatible with all versions except Chemprop v.1.6.1, which cannot process the `charges_eps_4`  and `charges_eps_78` datasets (all other benchmarks work as expected). We therefore recommend to always use Version 2 of the dataset (with reformatted `charges_eps_4`  and `charges_eps_78`  datasets), since it is compatible with all versions of Chemprop. For use with any other ML software, you can use any version.
创建时间:
2023-11-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作