Multimodal Spectroscopic Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://rxn4chemistry.github.io/multimodal-spectroscopic-dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了从专利数据中的化学反应中提取的794,403个独特分子的模拟1H-NMR、13C-NMR、HSQC-NMR、红外光谱和质谱。该数据集为整合来自多种光谱学模态信息的基础模型开发提供了可能,并为评估单一模态任务提供了基准。其规模达到了794,403个独特分子,任务包括结构阐明、为目标分子预测光谱以及功能团预测。
This dataset contains simulated 1H-NMR, 13C-NMR, HSQC-NMR, infrared spectra, and mass spectra of 794,403 unique molecules extracted from chemical reactions in patent data. It supports the development of foundation models that integrate information from multiple spectroscopic modalities, and also serves as a benchmark for evaluating single-modality tasks. With 794,403 unique molecules included, the dataset covers tasks including structure elucidation, spectrum prediction for target molecules, and functional group prediction.
提供机构:
Authors of the paper



