Fine-Grained Font Groups and Transcriptions

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/7614684

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset, produced whithin the OCR-D project, is composed of transcriptions of early modern prints with multiple fonts. We provide it in two formats: text lines and full pages. Content lines.zip This archive contains the transcribed text lines in the usual combination of pairs of images and text files. The images have been cropped but not otherwise processed (i.e., no binarization, size normalization, or any other modification). Moreover, for each text line, there is an extra text file with the ".font" extension. It has the same number of characters as the transcription, and encodes the font group of each character ("a" for Antiqua, "b" for Bastarda, ...). full_pages.zip This archive contains the full size images used to produce lines.zip, as well as the ground truth produced with FRAT. md.json This file contains metadata, such as name of the books, place of production, date of production, ... public_test_set.zip This archive contains test text lines, without ground truth. You can evaluate your performance on these text lines with Codalab. We created one competition for methods trained on the provided data only, and another one for which there is no restriction on using extra-data. More information on the competition is available on its website, and in its publication: van der Loop, Janne, et al. "ICDAR 2024 Competition on Multi Font Group Recognition and OCR." International Conference on Document Analysis and Recognition. Cham: Springer Nature Switzerland, 2024.

创建时间：

2025-02-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集