Fine-Grained Font Groups and Transcriptions
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/7614684
下载链接
链接失效反馈官方服务:
资源简介:
This dataset, produced whithin the OCR-D project, is composed of transcriptions of early modern prints with multiple fonts. We provide it in two formats: text lines and full pages.
Content
lines.zip
This archive contains the transcribed text lines in the usual combination of pairs of images and text files. The images have been cropped but not otherwise processed (i.e., no binarization, size normalization, or any other modification).
Moreover, for each text line, there is an extra text file with the ".font" extension. It has the same number of characters as the transcription, and encodes the font group of each character ("a" for Antiqua, "b" for Bastarda, ...).
full_pages.zip
This archive contains the full size images used to produce lines.zip, as well as the ground truth produced with FRAT.
md.json
This file contains metadata, such as name of the books, place of production, date of production, ...
public_test_set.zip
This archive contains test text lines, without ground truth. You can evaluate your performance on these text lines with Codalab. We created one competition for methods trained on the provided data only, and another one for which there is no restriction on using extra-data. More information on the competition is available on its website, and in its publication:
van der Loop, Janne, et al. "ICDAR 2024 Competition on Multi Font Group Recognition and OCR." International Conference on Document Analysis and Recognition. Cham: Springer Nature Switzerland, 2024.
创建时间:
2025-02-24



