five

RMG-DB-11: Enumerating Reaction Space for Small Molecule Chemistry

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8144352
下载链接
链接失效反馈
官方服务:
资源简介:
This repository presents approximately 750 million atom-mapped reaction SMILES. Reactions are generated by applying templates from the Reaction Mechanism Generator (RMG) database to a subset of the species from GDB11. Thus, we refer to this dataset as RMG-DB-11 i.e., the Reaction Mechanism Generator Database whose species contain up to 11 heavy atoms. All SMILES have been canonicalized by RDKit. All reactions are labeled with their corresponding RMG template. This data serves as a crucial starting point for quantitative predictive chemistry. Many methods that search for transition state structures require atom-mapped SMILES, which this repository provides. This data is also well-suited for unsupervised pre-training of various machine learning models. To parse the data with Python, start with import pandas as pd. Reactions with 1-8 heavy atoms can be parsed using the following code snippet: pd.read_csv(). Reactions with 9 heavy atoms can be parsed using pd.read_pickle(, compression='zip'). The file names below include the word "zip" as a helpful hint to use the compression argument. Due to the large number of reactions with 10 and 11 heavy atoms, these are split into smaller chunks. First untar the file using tar -xvf to obtain several zipped pickle files that can each be parsed using the same method as with 9 heavy atoms.
创建时间:
2024-01-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作