CHORISO - chemical reaction SMILES from academic journals
收藏DataCite Commons2025-06-01 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/CHORISO_-_chemical_reaction_SMILES_from_academic_journals/22598230/1
下载链接
链接失效反馈官方服务:
资源简介:
CHORISO (<b>CH</b>emical <b>O</b>rganic <b>R</b>eact<b>I</b>on <b>S</b>MILES <b>O</b>mnibus) is a curated dataset containing chemical reactions SMILES extracted from high-impact factor journals. It is built using the CJHIF dataset, and the resulting data is used to propose a new holistic evaluation of reaction prediction models (see paper). A detailed explanation of the processing steps and proposed metrics is included in this repo<i>.</i>The following files are included:choriso_public.tar.gz: compressed file containing the ChORISO dataset, 2'224'239 canonical reaction SMILES.uspto_public.tar.gz: file containing the USPTO dataset cleaned and processed following the same pipeline than CHORISO.splits.tar.gz: compressed folder containing the training, validation and test files used to train and evaluate models in the study.<br>
CHORISO(全称Chemical Organic Reaction SMILES Omnibus,即化学有机反应简化分子线性输入规范(SMILES)全集)是一套经过人工精选的数据集,收录了从高影响因子期刊中提取的化学反应SMILES。该数据集基于CJHIF数据集构建,所得数据可用于提出一种全新的反应预测模型整体化评估方案(详见相关论文)。本仓库包含了数据处理流程与所提出评估指标的详细说明。本数据集附带以下文件:choriso_public.tar.gz:包含CHORISO数据集的压缩文件,内含2224239条标准化反应SMILES;uspto_public.tar.gz:按照与CHORISO相同的处理流程完成清洗与标准化的USPTO数据集文件;splits.tar.gz:包含本研究中用于训练与评估模型的训练集、验证集与测试集文件的压缩文件夹。
提供机构:
figshare
创建时间:
2023-12-15
搜集汇总
数据集介绍

背景与挑战
背景概述
CHORISO数据集是一个精选的化学反应SMILES数据集,包含2,224,239个规范反应SMILES,来源于高影响因子期刊。该数据集用于反应预测模型的全面评估,并提供了与USPTO数据集相同的处理流程的清洁和加工版本。
以上内容由遇见数据集搜集并总结生成



