Jing bao ground truth – text block crops and annotations
收藏DataCite Commons2024-06-18 更新2025-04-17 收录
下载链接:
https://heidata.uni-heidelberg.de/citation?persistentId=doi:10.11588/data/PVYWKB
下载链接
链接失效反馈官方服务:
资源简介:
This is the data set related to the paper "Language Model Assisted OCR Classification for Republican Chinese Newspaper Text", JDADH 11/2023. In this work, we present methods to obtain a neural optical character recognition (OCR) tool for article blocks in a Republican Chinese newspaper. The dataset contains two subsets: The pairs of text block crops and corresponding ground truth annotations from April 1920, 1930 and 1939 of the Jingbao newspaper (jingbao_annotated_crops.zip). The labeled images of single characters which we automatically cropped from the April 1939 issues of the Jingbao using separators generated from projection profiles (jingbao_char_imgs.zip).
提供机构:
heiDATA
创建时间:
2022-06-23



