USPTO Reaction Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/connorcoley/rdchiral
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了大约380万条从美国专利商标局(USPTO)提取的化学反应信息,这些数据被用于训练一步模型以及提取合成路线。为了提高数据质量,该数据集经过了去重和错误映射的清理处理,最终保留了大约130万条反应数据,用于训练、验证和测试。具体来说,这些数据被分为80%用于训练,10%用于验证,另外10%用于测试。该数据集的规模达到了380万条反应,其任务是进行逆合成规划。
This dataset contains approximately 3.8 million chemical reaction records extracted from the United States Patent and Trademark Office (USPTO). Initially, these data were utilized for training one-step models and extracting synthetic reaction routes. To enhance data quality, the dataset was processed via deduplication and error mapping cleaning, and ultimately retained approximately 1.3 million high-quality reaction records for model training, validation, and testing. Specifically, the cleaned 1.3 million reaction records are partitioned into three subsets: 80% for training, 10% for validation, and the remaining 10% for testing. The original dataset, with a total scale of 3.8 million reaction entries, is dedicated to the task of retrosynthetic planning.
提供机构:
United States Patent Office (USPTO)
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



