Supplementary data for the manuscript: Image2SMILES: Transformer-based Molecular Optical Recognition Engine
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5069805
下载链接
链接失效反馈官方服务:
资源简介:
This is the supplementary data for the manuscript: Image2SMILES: Transformer-based Molecular Optical Recognition Engine
It contains pairs of image-string, generated from 1M SMILES strings. These strings were randomly chosen from PubChem database.
It was prepared using the code, published at https://github.com/syntelly/img2smiles_generator/
To unpack do:
tar xvf subset_1M.tar.xz && tar xvf subset_1M_dump.tar.gz && rm subset_1M_dump.tar.gz
You'll get the following data:
subset_1M.smi - list of 1M source SMILES
subset_1M_dump - directory with images
subset_1M_result.csv - list of pairs FGSMILES - pathcode, first 3 chars of pathcode are corresponding subdirs in subset_1M_dump
subset_1M_fails.csv - list of failed molecules from subset_1M.smi
subset_1M_grpcounter.lst - list of counted groups, used in this generation
You can generate your own data using https://github.com/syntelly/img2smiles_generator/
创建时间:
2021-07-05



