five

USPTO Reaction Dataset

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/connorcoley/rdchiral
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了大约380万条从美国专利商标局(USPTO)提取的化学反应信息,这些数据被用于训练一步模型以及提取合成路线。为了提高数据质量,该数据集经过了去重和错误映射的清理处理,最终保留了大约130万条反应数据,用于训练、验证和测试。具体来说,这些数据被分为80%用于训练,10%用于验证,另外10%用于测试。该数据集的规模达到了380万条反应,其任务是进行逆合成规划。

This dataset contains approximately 3.8 million chemical reaction records extracted from the United States Patent and Trademark Office (USPTO). Initially, these data were utilized for training one-step models and extracting synthetic reaction routes. To enhance data quality, the dataset was processed via deduplication and error mapping cleaning, and ultimately retained approximately 1.3 million high-quality reaction records for model training, validation, and testing. Specifically, the cleaned 1.3 million reaction records are partitioned into three subsets: 80% for training, 10% for validation, and the remaining 10% for testing. The original dataset, with a total scale of 3.8 million reaction entries, is dedicated to the task of retrosynthetic planning.
提供机构:
United States Patent Office (USPTO)
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作