Hydrogen Abstraction Reaction Data
收藏DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20039313
下载链接
链接失效反馈官方服务:
资源简介:
================================================================SDF (In zip file)A collection of hydrogen abstraction reactions, stored in SDF format.================================================================
Each SDF contains the transition state molecule, the reactants and product molecules. Additional information about each molecule is provided as properties in the SDF.
Always-required properties (present for every record)
reaction (string)Unique reaction identifier used to link all records belonging to the same reaction.
type (string)Record role within the reaction. One of:r1, r1h (donor heavy atom / donor + abstracted H),r2, r2h (acceptor heavy atom / acceptor + abstracted H),ts (transition state).
rmg_smiles (string)Canonical RMG SMILES representation (unmapped).
rmg_adjacency_list (string)RMG adjacency list describing the molecular graph.
ordered_mapped_smiles (string)Atom-mapped SMILES with a stable atom ordering. Atom map numbers define the canonical mapping used for role assignment.
role_mapnums (JSON)Mapping from role keys (e.g., r1h_a, r1h_h, r2h_a, r2h_h) to atom map numbers (from ordered_mapped_smiles).
role_mapidx (JSON)Mapping from the same role keys to RDKit atom indices (0-based).
mol_properties (JSON)Per-atom metadata keyed by RDKit atom index (0-based, stored as strings).Each entry contains:
label: one of donor, acceptor, d_hydrogen, a_hydrogen, or TS path tags *0 … *4
atom_type: RMG atom type (e.g., Cs, Cd, O2s, H0)
multiplicity (int)Spin multiplicity.
spin_multiplicity (float)Numeric form of spin multiplicity (redundant, used by some downstream tools).
level_of_theory (string)Combined method/basis label (e.g., wb97xd/def2tzvp).
lot_method (string)Electronic structure method only (e.g., wb97xd).
lot_basis (string)Basis set only (e.g., def2tzvp).
optical_isomers (float)Number of optical isomers.
Arkane-derived thermochemistry (present when available)
These properties are included when Arkane parsing succeeds; otherwise they may be absent or set to null.
frequencies_cm1_list (list of float)Vibrational frequencies in cm⁻¹.
ts_imag_freq_cm1 (float or null)Imaginary frequency for transition states; null for non-TS records.
E0_value, E0_units (float, string)Zero-point corrected energy.
H298_value, H298_units (float, string)Standard enthalpy at 298 K.
S298_value, S298_units (float, string)Standard entropy at 298 K.
Cp_T_value_list, Cp_T_units (list, string)Heat capacities at tabulated temperatures.
T_list, T_units (list, string)Temperature grid corresponding to tabulated properties.
polynomials (JSON)NASA polynomial coefficients and validity ranges.
ZPE_kJmol (float)Zero-point energy in kJ/mol.
E_elec_kJmol (float)Electronic energy.
Convenience fields (float)Rounded or unit-converted summaries such as E0_kJmol, H298_kJmol.
================================================================reactions.csvGraph-ready Arrhenius targets, one row per reaction.================================================================
A flat CSV containing the kinetic targets and reaction-centre annotations for graph-only model development without requiring the SDF geometries. One row per reaction, indexed by rxn_id.
Identification fields (always present)
rxn_id (string)Unique reaction identifier; matches the SDF filename and the cross-reference key used in the supplementary data files.
sdf_file (string)Filename of the matching SDF in the SDF archive deposit.
Atom-mapped SMILES
r1h_smiles, r2h_smiles (string)Unmapped canonical SMILES for the donor (with abstracted H) and acceptor (with abstracted H) species.
r1h_ordered_mapped_smiles, r2h_ordered_mapped_smiles, r1_ordered_mapped_smiles, r2_ordered_mapped_smiles (string)Atom-mapped SMILES with stable atom ordering for the four species participating in the reaction (donor + H, acceptor + H, donor radical, acceptor radical).
ts_ordered_mapped_smiles (string)Atom-mapped SMILES for the transition-state complex.
ts_role_mapidx (JSON)Mapping from TS role keys to RDKit atom indices.
Reaction-centre annotations
r1h_center_idx, r2h_center_idx (int)RDKit atom index of the donor or acceptor heavy atom directly bonded to the abstracted hydrogen.
r1h_center_label, r2h_center_label (string)One of donor or acceptor.
r1h_center_atom_type, r2h_center_atom_type (string)RMG atom type at the reaction centre (e.g., Cs, Cd, O2s, N3s).
Modified-Arrhenius parameters (forward and reverse directions)
A_for_TSTT, A_rev_TSTT (float)Pre-exponential factor in cm^3 mol^-1 s^-1 from the Arkane TST + Eckart fit.
A_for_TSTT_log10, A_rev_TSTT_log10 (float)Base-10 logarithm of the prefactor.
n_for_TSTT, n_rev_TSTT (float)Modified-Arrhenius temperature exponent (dimensionless).
Ea_for_TSTT, Ea_rev_TSTT (float)Activation energy in kJ/mol.
dA_for_TSTT, dA_rev_TSTT (float)Multiplicative residual on the prefactor reported by Arkane.
dn_for_TSTT, dn_rev_TSTT (float)Additive residual on the temperature exponent.
dEa_for_TSTT, dEa_rev_TSTT (float)Additive residual on the activation energy in kJ/mol.
Temperature range
Tmin, Tmax (float)Temperature range in K over which k(T) was evaluated and the modified-Arrhenius fit was performed (typically 300-3000 K).
================================================================dlpno_thermo.jsonPer-species NASA polynomials at the paper's electronic-structure level.================================================================
A JSON file releasing the per-species thermochemistry used to construct the kinetic targets, in NASA polynomial form. Top-level is a list of records, one per reaction.
Per-record fields
rxn_id (string)Matches the corresponding row in reactions.csv and the matching SDF filename.
polynomials (JSON)Dictionary keyed by species role (r1, r2, r1h, r2h).
Per-species polynomial format
Each species record is a list of two NASA polynomial pieces, low-temperature followed by high-temperature, each piece formatted as:
[Tmin (K), Tmax (K), [a1, a2, a3, a4, a5, a6, a7]]
The seven coefficients follow the standard NASA-7 convention:
Cp / R = a1 + a2 T + a3 T^2 + a4 T^3 + a5 T^4H / RT = a1 + a2 T / 2 + a3 T^2 / 3 + a4 T^3 / 4 + a5 T^4 / 5 + a6 / TS / R = a1 ln T + a2 T + a3 T^2 / 2 + a4 T^3 / 3 + a5 T^4 / 4 + a7
Computed by Arkane at the paper's electronic-structure level: DLPNO-CCSD(T)-F12 / cc-pVTZ-F12 single-point energies on wb97xd / def2-TZVP optimised geometries, with Eckart tunnelling corrections at the TS and atom-energy corrections only (no bond-additivity corrections).
================================================================oof_predictions.csvLeakage-free out-of-fold model predictions, one row per reaction.================================================================
CSV containing the held-out (out-of-fold) prediction from the published RA+Geom DMPNN under the 10-fold group-aware cross-validation split. Each prediction is generated by the fold whose training split excluded that reaction, so the predictions are leakage-free.
rxn_id (string), fold (int), idx (int)Reaction identifier, fold index of the held-out split (0-9), and within-fold position.
donor_smiles, acceptor_smiles (string)Canonical SMILES for the donor and acceptor species.
donor_name, acceptor_name (string)Species labels used internally.
y_true_A_for, y_true_n_for, y_true_Ea_for, y_true_A_rev, y_true_n_rev, y_true_Ea_rev (float)Reference modified-Arrhenius parameters from the dataset.
y_pred_A_for, y_pred_n_for, y_pred_Ea_for, y_pred_A_rev, y_pred_n_rev, y_pred_Ea_rev (float)Model's predicted parameters for the held-out reaction.
================================================================best_config.jsonHyperparameter configuration of the headline model.================================================================
A small JSON file recording the hyperparameter configuration of the RA+Geom DMPNN reported in the manuscript: feature-mode flags, MPNN depth and width, optimiser settings, and Morgan-fingerprint parameters. Used together with repro_bundle.json and pinned_best-best.ckpt to reproduce the headline model.
================================================================repro_bundle.jsonFull reproducibility bundle.================================================================
JSON file containing the trained scalers (vf_scaler_b64, y_scaler_b64), per-fold cross-validation split indices, the data hashes used during training (hash_rad_csv, hash_target_csv), and the reported headline mae_lnk_avg. Together with pinned_best-best.ckpt enables bit-for-bit reproduction of the headline model without retraining.
================================================================pinned_best-best.ckptTrained RA+Geom DMPNN checkpoint.================================================================
The published RA+Geom DMPNN checkpoint, exactly as used to generate the manuscript's headline results. Loadable with the inference scripts shipped in the accompanying GitHub repository to evaluate any user-supplied reaction directly without retraining.
提供机构:
Zenodo
创建时间:
2026-05-05



