five

Supplementary data for the manuscript: Image2SMILES: Transformer-based Molecular Optical Recognition Engine

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5069805
下载链接
链接失效反馈
官方服务:
资源简介:
This is the supplementary data for the manuscript: Image2SMILES: Transformer-based Molecular Optical Recognition Engine It contains pairs of image-string, generated from 1M SMILES strings. These strings were randomly chosen from PubChem database. It was prepared using the code, published at https://github.com/syntelly/img2smiles_generator/ To unpack do: tar xvf subset_1M.tar.xz && tar xvf subset_1M_dump.tar.gz && rm subset_1M_dump.tar.gz You'll get the following data: subset_1M.smi - list of 1M source SMILES subset_1M_dump - directory with images           subset_1M_result.csv - list of pairs FGSMILES - pathcode, first 3 chars of pathcode are corresponding subdirs in subset_1M_dump subset_1M_fails.csv - list of failed molecules from subset_1M.smi subset_1M_grpcounter.lst - list of counted groups, used in this generation You can generate your own data using https://github.com/syntelly/img2smiles_generator/
创建时间:
2021-07-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作