Performance of chemical structure string representations for chemical image recognition using transformers dataset
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5155036
下载链接
链接失效反馈官方服务:
资源简介:
The datasets contain string representations used for DECIMER short communication paper.
ChEMBL dataset:
Train and test datasets downloaded from ChEMBL and curated. Contains data with and without stereochemistry. Separated as Canonical and Isomeric.
String representations contain SMILES, DeepSMILES, SELFIES and InChIs.
Train dataset: 1.5 Mio molecules
Test dataset: ~100K molecules
Pubchem dataset:
Train and test datasets downloaded from PubChem and curated. Contains data with and without stereochemistry. Separated as Canonical and Isomeric.
String representations contain SMILES, DeepSMILES and SELFIES.
Train dataset: 3 Mio molecules
Test dataset: 250K molecules
创建时间:
2022-04-13



