five

Dataset used in the paper "Assessing the Extrapolation Capability of Template-free Retrosynthesis Models"

收藏
DataCite Commons2026-04-06 更新2026-02-09 收录
下载链接:
https://figshare.com/articles/dataset/Dataset_used_in_the_paper_b_A_critical_assessment_of_the_retrosynthetic_extrapolating_capability_of_template-free_machine_learning_models_b_/30843134
下载链接
链接失效反馈
官方服务:
资源简介:
This repository provides the complete datasets, raw model predictions, and analysis scripts required to reproduce all results and figures presented in the paper <i>"</i><b><i>Assessing the Extrapolation Capability of Template-free Retrosynthesis Models"</i></b>.The provided data allows researchers to evaluate and analyze the extrapolation capabilities of various template-free retrosynthesis models, specifically focusing on their ability to handle out-of-distribution (OOD) reaction templates and their round-trip prediction accuracy.<b>Repository Structure &amp; Contents:</b> The repository is organized into two main scales based on the Lowe USPTO dataset: a 50k dataset and a 480k dataset.<code><strong>dataset_50k/</strong></code><b> &amp; </b><code><strong>dataset_480k/</strong></code>: Contains the raw reaction data, ground-truth test sets to regenerate the train/valid/test splits. It includes data both with and without atom mapping (generated via LocalMapper).<code><strong>prediction_50k/</strong></code><b> &amp; </b><code><strong>prediction_480k/</strong></code>: Contains the raw, unmodified prediction outputs from the evaluated baseline models: Molecular Transformer (MT), MEGAN (MG), GraphRetro(GR) and Chemformer (CF). It also includes the forward prediction results generated by LocalTransform for round-trip validation.<code><strong>analysis_50k/</strong></code><b> &amp; </b><code><strong>analysis_480k/</strong></code>: Provide the processed prediction files required to compute the final evaluation metrics:<b>Exact-match accuracy</b> (Top-1 to Top-10)<b>Template distribution analysis</b> (categorizing predictions)<b>Round-trip accuracy</b> (using LocalTransform)<b>Usage &amp; Reproducibility:</b> Detailed instructions for reproducing the full analysis are provided in our GitHub repository, <i>Assessing the Extrapolation Capability of Template-free Retrosynthesis Models</i> (GitHub repository). This repository includes the complete five-step pipeline for preprocessing predictions, processing forward-reaction data, and generating the final evaluation metrics.

本数据集构建并应用于题为《无模板机器学习模型(Template-free Machine-Learning Models)的逆合成外推能力批判性评估》的研究,用于评估最先进的无模板逆合成模型在分布外基准测试集下的外推性能。
提供机构:
figshare
创建时间:
2025-12-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作