Table Recognition Benchmark on Biomedical Literature on Neurological Disorders
收藏Mendeley Data2024-06-25 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/5549977
下载链接
链接失效反馈官方服务:
资源简介:
The dataset contains 1650 tables from 1164 PMC OA articles in the context of neurological disorders. The tables are structured in the International Conference on Document Analysis and Recognition (ICDAR) format. The additional csv file contains a labeling into 3 different complexity classes in the format: class document_id table_id with the classes being: 0 = simple 1 = complicated 2 = complex You may use the scripts from this repository to bulk download PDF sources from PMC. A script for evaluating results against the groundtruth data can be found here.
本数据集收录了1164篇PubMed Central开放获取(PubMed Central Open Access, PMC OA)文章中的1650张表格,研究领域涉及神经系统疾病。所有表格均采用国际文档分析与识别会议(International Conference on Document Analysis and Recognition, ICDAR)标准格式进行结构化组织。附带的逗号分隔值(Comma-Separated Values, CSV)文件包含三类不同复杂度的标注信息,标注格式为:类别 文档ID 表格ID,其中类别定义如下:0 = 简单,1 = 较复杂,2 = 复杂。您可使用本仓库中的脚本批量下载PMC中的PDF源文件。可在此处获取用于基于基准真值(ground truth)数据评估结果的脚本。
创建时间:
2023-06-28



