A Comprehensive Dataset of Chemical Reactions Covering Second and Third Row Elements with Million-Scale Quantum Chemical Calculations
收藏DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.18551029
下载链接
链接失效反馈官方服务:
资源简介:
Files contained in Reaction-QM
The Reaction-QM dataset consists of three datasets: GFN2-RXN, B3LYP-RXN, and B3LYP-IRC, which are organized as follows:
GFN2-RXN
GFN2-xTB_reaction_info.csv: Summarizes reactions with reaction SMILES (SMARTS) and reaction properties calculated at the GFN2-xTB level.
GFN2_xTB.h5: Contains the fully optimized geometries of reactants, products, and transition states, along with their corresponding electronic energies calculated at the GFN2-xTB level.
GFN2-RXN_train_250k.csv: Training data (80% of the 250k subset of GFN2-xTB_reaction_info.csv), split based on the chemical diversity of reactions.
GFN2-RXN_valid_250k.csv: Validation data (10% of the 250k subset of GFN2-xTB_reaction_info.csv), split based on the chemical diversity of reactions.
GFN2-RXN_test_250k.csv: Test data (10% of the 250k subset of GFN2-xTB_reaction_info.csv), split based on the chemical diversity of reactions.
GFN2-RXN_train_full.csv: Training data (80% of the full set of GFN2-xTB_reaction_info.csv), split based on the chemical diversity of reactions.
GFN2-RXN_valid_full.csv: Validation data (10% of the full set of GFN2-xTB_reaction_info.csv), split based on the chemical diversity of reactions.
GFN2-RXN_test_full.csv: Test data (10% of the full set of GFN2-xTB_reaction_info.csv), split based on the chemical diversity of reactions.
B3LYP-RXN
B3LYPD3_TZVP_reaction_info.csv: Summarizes reactions with reaction SMILES (SMARTS) and reaction properties calculated at the B3LYP-D3/TZVP level.
B3LYPD3_TZVP.h5: Contains the fully optimized geometries of reactants, products, and transition states, along with their corresponding electronic energies calculated at the B3LYP-D3/TZVP level.
B3LYP-RXN_train.csv: Training data (80%), split based on the chemical diversity of reactions.
B3LYP-RXN_valid.csv: Vadliation data (10%), split based on the chemical diversity of reactions.
B3LYP-RXN_test.csv: Test data (10%), split based on the chemical diversity of reactions.
B3LYP-IRC
B3LYPD3_TZVP_IRC.h5: Contains IRC trajectories of the reactions included in B3LYP-RXN, calculated at the B3LYP-D3/TZVP level using Gaussian 16.
The following script is a simple example for reading GFN2_xTB.h5 file or B3LYPD3_TZVP.h5 file:
import sysimport h5py
def print_data(species_data): smiles = species_data['smiles'].asstr()[()] EHG = species_data['EHG'][()] chg = species_data['charge'][()] multiplicity = species_data['multiplicity'][()] z_list = species_data['atomic_numbers'][()] coords = species_data['coordinates'][()] print(f"SMILES: {smiles}") print(f"E, H, G (Hartree): {EHG}") print(f"Charge : {chg}") print(f"Spin multiplicity : {multiplicity}") print('xyz coordinates (Å):') for atom_num, coord in zip(z_list, coords): print(f"{atom_num} {coord[0]} {coord[1]} {coord[2]}")
# Enter the HDF5 file path and target reaction id (optional)file_name = sys.argv[1]h5file = h5py.File(file_name, 'r')target = sys.argv[2] if len(sys.argv) > 2 else 'RXN_0000000001'
# Check the format of the data ...length = 10if target.isdigit(): rxn_number = target.rjust(length,'0') target = f'RXN_{rxn_number}'
found = Falsefor root_key, root_value in h5file.items(): # root_key: RXN_A to RXN_B, root_value: dict of molecule + ts data rxn_name_keys = root_value.keys() print (f'Checking files in {root_key} ...') if target not in rxn_name_keys: continue
print ('Found desired key !') found = True molecules_and_ts_dict = root_value[target] print(f"---- Information of {target} ----")
# Parse data of reactants, products, and transition state for molecule_tag, molecule_data in molecules_and_ts_dict.items(): print(f"-------------------- {molecule_tag} --------------------") print_data(molecule_data)
break # Print only one reaction
if not found: print (f'Desired reaction (={rxn_name}) not found !!') print ('Check the key again ...')
The following script is another example code for reading B3LYPD3_TZVP_IRC.h5 file:
# Switch the file path if you want to read another IRC datafile_name = sys.argv[1]h5file = h5py.File(file_name, 'r')target = sys.argv[2] if len(sys.argv) > 2 else 'RXN_0000000001'
# Check the format of the data ...length = 10if target.isdigit(): rxn_number = target.rjust(length,'0') target = f'RXN_{rxn_number}'
TS_info_dict = h5file[target]print(f"---- {target} ----")numbers = TS_info_dict['atomic_numbers'][()]coords = TS_info_dict['coordinates'][()]energies = TS_info_dict['energies'][()]forces = TS_info_dict['forces'][()]print(f"Atomic numbers: {numbers}")print(f"Coordinates: {coords}")print(f"Energies: {energies}")print(f"Forces: {forces}")
提供机构:
Zenodo
创建时间:
2026-02-23



