five

A Comprehensive Dataset of Chemical Reactions Covering Second and Third Row Elements with Million-Scale Quantum Chemical Calculations

收藏
DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.17377504
下载链接
链接失效反馈
官方服务:
资源简介:
Files contained in Reaction-QM The Reaction-QM dataset consists of three datasets: GFN2-RXN, B3LYP-RXN, and B3LYP-IRC, which are organized as follows: GFN2-RXN GFN2-xTB_reaction_info.csv: Summarizes reactions with reaction SMILES (SMARTS) and reaction properties calculated at the GFN2-xTB level. GFN2_xTB.h5: Contains the fully optimized geometries of reactants, products, and transition states, along with their corresponding electronic energies calculated at the GFN2-xTB level.  GFN2-RXN_train_250k.csv: Training data (80% of the 250k subset of GFN2-xTB_reaction_info.csv), split based on the chemical diversity of reactions. GFN2-RXN_valid_250k.csv: Validation data (10% of the 250k subset of GFN2-xTB_reaction_info.csv), split based on the chemical diversity of reactions. GFN2-RXN_test_250k.csv: Test data (10% of the 250k subset of GFN2-xTB_reaction_info.csv), split based on the chemical diversity of reactions. GFN2-RXN_train_full.csv: Training data (80% of the full set of GFN2-xTB_reaction_info.csv), split based on the chemical diversity of reactions. GFN2-RXN_valid_full.csv: Validation data (10% of the full set of GFN2-xTB_reaction_info.csv), split based on the chemical diversity of reactions. GFN2-RXN_test_full.csv: Test data (10% of the full set of GFN2-xTB_reaction_info.csv), split based on the chemical diversity of reactions. B3LYP-RXN B3LYPD3_TZVP_reaction_info.csv: Summarizes reactions with reaction SMILES (SMARTS) and reaction properties calculated at the B3LYP-D3/TZVP level. B3LYPD3_TZVP.h5: Contains the fully optimized geometries of reactants, products, and transition states, along with their corresponding electronic energies calculated at the B3LYP-D3/TZVP level. B3LYP-RXN_train.csv: Training data (80%), split based on the chemical diversity of reactions. B3LYP-RXN_valid.csv: Vadliation data (10%), split based on the chemical diversity of reactions. B3LYP-RXN_test.csv: Test data (10%), split based on the chemical diversity of reactions. B3LYP-IRC B3LYPD3_TZVP_IRC.h5: Contains IRC trajectories of the reactions included in B3LYP-RXN, calculated at the B3LYP-D3/TZVP level using Gaussian 16.    The following script is a simple example for reading GFN2_xTB.h5 file or B3LYPD3_TZVP.h5 file: import sysimport h5py def print_data(species_data):    smiles = species_data['smiles'].asstr()[()]    EHG = species_data['EHG'][()]    chg = species_data['charge'][()]    multiplicity = species_data['multiplicity'][()]    z_list = species_data['atomic_numbers'][()]    coords = species_data['coordinates'][()]    print(f"SMILES: {smiles}")    print(f"E, H, G (Hartree): {EHG}")    print(f"Charge : {chg}")    print(f"Spin multiplicity : {multiplicity}")    print('xyz coordinates (Å):')    for atom_num, coord in zip(z_list, coords):        print(f"{atom_num} {coord[0]} {coord[1]} {coord[2]}") # Enter the HDF5 file path and target reaction id (optional)file_name = sys.argv[1]h5file = h5py.File(file_name, 'r')target = sys.argv[2] if len(sys.argv) > 2 else 'RXN_0000000001' # Check the format of the data ...length = 10if target.isdigit():    rxn_number = target.rjust(length,'0')    target = f'RXN_{rxn_number}' found = Falsefor root_key, root_value in h5file.items(): # root_key: RXN_A to RXN_B, root_value: dict of molecule + ts data    rxn_name_keys = root_value.keys()     print (f'Checking files in {root_key} ...')    if target not in rxn_name_keys:        continue     print ('Found desired key !')    found = True    molecules_and_ts_dict = root_value[target]     print(f"---- Information of {target} ----")     # Parse data of reactants, products, and transition state    for molecule_tag, molecule_data in molecules_and_ts_dict.items():        print(f"-------------------- {molecule_tag} --------------------")        print_data(molecule_data)     break  # Print only one reaction if not found:    print (f'Desired reaction (={rxn_name}) not found !!')    print ('Check the key again ...')   The following script is another example code for reading B3LYPD3_TZVP_IRC.h5 file: # Switch the file path if you want to read another IRC datafile_name = sys.argv[1]h5file = h5py.File(file_name, 'r')target = sys.argv[2] if len(sys.argv) > 2 else 'RXN_0000000001' # Check the format of the data ...length = 10if target.isdigit():    rxn_number = target.rjust(length,'0')    target = f'RXN_{rxn_number}' TS_info_dict = h5file[target]print(f"---- {target} ----")numbers = TS_info_dict['atomic_numbers'][()]coords = TS_info_dict['coordinates'][()]energies = TS_info_dict['energies'][()]forces = TS_info_dict['forces'][()]print(f"Atomic numbers: {numbers}")print(f"Coordinates: {coords}")print(f"Energies: {energies}")print(f"Forces: {forces}")
提供机构:
Zenodo
创建时间:
2025-10-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作