five

Images dataset for Chemical Images Classifier model

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13378717
下载链接
链接失效反馈
官方服务:
资源简介:
Original paper The manually curated images dataset is a part of the Supplementary Materials of the paper: A. Krasnov, S. Barnabas, T. Böhme, S. Boyer, L. Weber, Comparing software tools for optical chemical structure recognition, Digital Discovery (2024). https://doi.org/10.1039/D3DD00228D Images dataset description The dataset was used to generate the image classifier model. The dataset consists of 16,000 images that were collected from different sources: 1)    Chemical data images extracted from EP, US, and WO patents by OntoChem GmbH. 2)    Images from the MolScribe datasets https://pubs.acs.org/doi/10.1021/acs.jcim.2c01480 3)    DECIMER–hand-drawn molecule images dataset H.O. Brinkhaus, A. Zielesny, C. Steinbeck, K. Rajan, “DECIMER - hand-drawn molecule images dataset”, 2022, Journal of Cheminformatics, 14, 36. https://doi.org/10.1186/s13321-022-00620-9 4)    Images from the Rxnscribe training set Y. Qian, J. Guo, Z. Tu, C.W. Coley, R. Barzilay, “RxnScribe: A Sequence Generation Model for Reaction Diagram Parsing”, 2023,  arXiv:2305.11845v1, https://doi.org/10.48550/arXiv.2305.11845  5)    Formulas images from the im2latex-100k dataset A prebuilt dataset for OpenAI's task for image-2-latex system, https://zenodo.org/record/56198#.YJjuCGZKgox (accessed 16 Januar 2024) Structure of dataset The dataset consists of two directories: The "classified" directory contains manually labeled images. These images are divided into four distinct categories, with each category including 4000 images: ●      one_molecule ●      several_molecules ●      reactions ●      other In the “for_model” folder, we have split the images for training, validation, and testing in order to create a Chemical Image Classifier model: ●      training: 12,804 images ●      test: 1,604 images ●      validation: 1,604 images.
创建时间:
2024-08-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作