Document to Chemical Structure Benchmarks (D2C-RND and D2C-UNI)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10978811
下载链接
链接失效反馈官方服务:
资源简介:
D2C-RND and D2C-UNI are benchmarks for simultaneously evaluating document segmentation, image classification, and molecule recognition.
Each of this dataset contains three subsets: a first set containing chemical images locations (segmentation), a second set with chemical images classes (classification), and a third set containing chemical-structures graphical descriptions (recognition). Molecules sampled from the recognition subset are taken from images in the classification dataset, which are taken from the pages in the segmentation dataset.
D2C-RND is sampled using a random distribution on chemical images and D2C-UNI covers a uniform distribution with respect to the year of publication and publishing office.
In total, these benchmarks contain 700 pages, 753 molecule images and 364 molecular-graphs.
These datasets are part of PatCID: an open-access dataset of chemical structures in patent documents.
创建时间:
2024-06-18



