Data set of the paper "Publishing an OCR ground truth data set for reuse in an unclear copyright setting"
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4742067
下载链接
链接失效反馈官方服务:
资源简介:
The data set consists of a METS file for each of the PDFs that were used for transcription and a directory data/page_xml that contains the transcriptions of the ground truth in PAGE-XML format. In parallel to the data set publication, a data paper will be published that contains a detailed description of the data set. As soon as it is published, we will link to it. The corresponding source code can be found here https://github.com/millawell/ocr-data/tree/1.1
创建时间:
2021-05-12



