five

Hydrogen Abstraction Reaction Data

收藏
DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20039313
下载链接
链接失效反馈
官方服务:
资源简介:
================================================================SDF (In zip file)A collection of hydrogen abstraction reactions, stored in SDF format.================================================================ Each SDF contains the transition state molecule, the reactants and product molecules. Additional information about each molecule is provided as properties in the SDF. Always-required properties (present for every record) reaction (string)Unique reaction identifier used to link all records belonging to the same reaction. type (string)Record role within the reaction. One of:r1, r1h (donor heavy atom / donor + abstracted H),r2, r2h (acceptor heavy atom / acceptor + abstracted H),ts (transition state). rmg_smiles (string)Canonical RMG SMILES representation (unmapped). rmg_adjacency_list (string)RMG adjacency list describing the molecular graph. ordered_mapped_smiles (string)Atom-mapped SMILES with a stable atom ordering. Atom map numbers define the canonical mapping used for role assignment. role_mapnums (JSON)Mapping from role keys (e.g., r1h_a, r1h_h, r2h_a, r2h_h) to atom map numbers (from ordered_mapped_smiles). role_mapidx (JSON)Mapping from the same role keys to RDKit atom indices (0-based). mol_properties (JSON)Per-atom metadata keyed by RDKit atom index (0-based, stored as strings).Each entry contains: label: one of donor, acceptor, d_hydrogen, a_hydrogen, or TS path tags *0 … *4 atom_type: RMG atom type (e.g., Cs, Cd, O2s, H0) multiplicity (int)Spin multiplicity. spin_multiplicity (float)Numeric form of spin multiplicity (redundant, used by some downstream tools). level_of_theory (string)Combined method/basis label (e.g., wb97xd/def2tzvp). lot_method (string)Electronic structure method only (e.g., wb97xd). lot_basis (string)Basis set only (e.g., def2tzvp). optical_isomers (float)Number of optical isomers.   Arkane-derived thermochemistry (present when available) These properties are included when Arkane parsing succeeds; otherwise they may be absent or set to null. frequencies_cm1_list (list of float)Vibrational frequencies in cm⁻¹. ts_imag_freq_cm1 (float or null)Imaginary frequency for transition states; null for non-TS records. E0_value, E0_units (float, string)Zero-point corrected energy. H298_value, H298_units (float, string)Standard enthalpy at 298 K. S298_value, S298_units (float, string)Standard entropy at 298 K. Cp_T_value_list, Cp_T_units (list, string)Heat capacities at tabulated temperatures. T_list, T_units (list, string)Temperature grid corresponding to tabulated properties. polynomials (JSON)NASA polynomial coefficients and validity ranges. ZPE_kJmol (float)Zero-point energy in kJ/mol. E_elec_kJmol (float)Electronic energy. Convenience fields (float)Rounded or unit-converted summaries such as E0_kJmol, H298_kJmol.   ================================================================reactions.csvGraph-ready Arrhenius targets, one row per reaction.================================================================ A flat CSV containing the kinetic targets and reaction-centre annotations for graph-only model development without requiring the SDF geometries. One row per reaction, indexed by rxn_id. Identification fields (always present) rxn_id (string)Unique reaction identifier; matches the SDF filename and the cross-reference key used in the supplementary data files. sdf_file (string)Filename of the matching SDF in the SDF archive deposit. Atom-mapped SMILES r1h_smiles, r2h_smiles (string)Unmapped canonical SMILES for the donor (with abstracted H) and acceptor (with abstracted H) species. r1h_ordered_mapped_smiles, r2h_ordered_mapped_smiles, r1_ordered_mapped_smiles, r2_ordered_mapped_smiles (string)Atom-mapped SMILES with stable atom ordering for the four species participating in the reaction (donor + H, acceptor + H, donor radical, acceptor radical). ts_ordered_mapped_smiles (string)Atom-mapped SMILES for the transition-state complex. ts_role_mapidx (JSON)Mapping from TS role keys to RDKit atom indices. Reaction-centre annotations r1h_center_idx, r2h_center_idx (int)RDKit atom index of the donor or acceptor heavy atom directly bonded to the abstracted hydrogen. r1h_center_label, r2h_center_label (string)One of donor or acceptor. r1h_center_atom_type, r2h_center_atom_type (string)RMG atom type at the reaction centre (e.g., Cs, Cd, O2s, N3s). Modified-Arrhenius parameters (forward and reverse directions) A_for_TSTT, A_rev_TSTT (float)Pre-exponential factor in cm^3 mol^-1 s^-1 from the Arkane TST + Eckart fit. A_for_TSTT_log10, A_rev_TSTT_log10 (float)Base-10 logarithm of the prefactor. n_for_TSTT, n_rev_TSTT (float)Modified-Arrhenius temperature exponent (dimensionless). Ea_for_TSTT, Ea_rev_TSTT (float)Activation energy in kJ/mol. dA_for_TSTT, dA_rev_TSTT (float)Multiplicative residual on the prefactor reported by Arkane. dn_for_TSTT, dn_rev_TSTT (float)Additive residual on the temperature exponent. dEa_for_TSTT, dEa_rev_TSTT (float)Additive residual on the activation energy in kJ/mol. Temperature range Tmin, Tmax (float)Temperature range in K over which k(T) was evaluated and the modified-Arrhenius fit was performed (typically 300-3000 K).   ================================================================dlpno_thermo.jsonPer-species NASA polynomials at the paper's electronic-structure level.================================================================ A JSON file releasing the per-species thermochemistry used to construct the kinetic targets, in NASA polynomial form. Top-level is a list of records, one per reaction. Per-record fields rxn_id (string)Matches the corresponding row in reactions.csv and the matching SDF filename. polynomials (JSON)Dictionary keyed by species role (r1, r2, r1h, r2h). Per-species polynomial format Each species record is a list of two NASA polynomial pieces, low-temperature followed by high-temperature, each piece formatted as: [Tmin (K), Tmax (K), [a1, a2, a3, a4, a5, a6, a7]] The seven coefficients follow the standard NASA-7 convention: Cp / R  = a1 + a2 T + a3 T^2 + a4 T^3 + a5 T^4H  / RT = a1 + a2 T / 2 + a3 T^2 / 3 + a4 T^3 / 4 + a5 T^4 / 5 + a6 / TS  / R  = a1 ln T + a2 T + a3 T^2 / 2 + a4 T^3 / 3 + a5 T^4 / 4 + a7 Computed by Arkane at the paper's electronic-structure level: DLPNO-CCSD(T)-F12 / cc-pVTZ-F12 single-point energies on wb97xd / def2-TZVP optimised geometries, with Eckart tunnelling corrections at the TS and atom-energy corrections only (no bond-additivity corrections).   ================================================================oof_predictions.csvLeakage-free out-of-fold model predictions, one row per reaction.================================================================ CSV containing the held-out (out-of-fold) prediction from the published RA+Geom DMPNN under the 10-fold group-aware cross-validation split. Each prediction is generated by the fold whose training split excluded that reaction, so the predictions are leakage-free. rxn_id (string), fold (int), idx (int)Reaction identifier, fold index of the held-out split (0-9), and within-fold position. donor_smiles, acceptor_smiles (string)Canonical SMILES for the donor and acceptor species. donor_name, acceptor_name (string)Species labels used internally. y_true_A_for, y_true_n_for, y_true_Ea_for, y_true_A_rev, y_true_n_rev, y_true_Ea_rev (float)Reference modified-Arrhenius parameters from the dataset. y_pred_A_for, y_pred_n_for, y_pred_Ea_for, y_pred_A_rev, y_pred_n_rev, y_pred_Ea_rev (float)Model's predicted parameters for the held-out reaction.   ================================================================best_config.jsonHyperparameter configuration of the headline model.================================================================ A small JSON file recording the hyperparameter configuration of the RA+Geom DMPNN reported in the manuscript: feature-mode flags, MPNN depth and width, optimiser settings, and Morgan-fingerprint parameters. Used together with repro_bundle.json and pinned_best-best.ckpt to reproduce the headline model.   ================================================================repro_bundle.jsonFull reproducibility bundle.================================================================ JSON file containing the trained scalers (vf_scaler_b64, y_scaler_b64), per-fold cross-validation split indices, the data hashes used during training (hash_rad_csv, hash_target_csv), and the reported headline mae_lnk_avg. Together with pinned_best-best.ckpt enables bit-for-bit reproduction of the headline model without retraining.   ================================================================pinned_best-best.ckptTrained RA+Geom DMPNN checkpoint.================================================================ The published RA+Geom DMPNN checkpoint, exactly as used to generate the manuscript's headline results. Loadable with the inference scripts shipped in the accompanying GitHub repository to evaluate any user-supplied reaction directly without retraining.
提供机构:
Zenodo
创建时间:
2026-05-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作