Dataset of Pages from Early Printed Books with Multiple Font Groups

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/3366685

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset is composed of photos of various resolution of 35'623 pages of printed books dating from the 15th to the 18th century. Each page has been attributed by experts from one to five labels corresponding to the font groups used in the text, with two extra-classes for non-textual content and fonts not present in the following list: Antiqua, Bastarda, Fraktur, Gotico Antiqua, Greek, Hebrew, Italic, Rotunda, Schwabacher, and Textura. Note that to make downloading the dataset with slow or unreliable Internet connections easier, the dataset has been separated in several zip files. All zip files must be extracted in the same folder. The CSV files containing the labels should ideally be in the parent folder. The labels are provided in two CSV files, one for training/tuning font group recognition methods, and the second one for evaluation purposes. Where several pages come from the same book, a special care has been taken to have all of them in the same subset. The paper presenting this dataset in detail is "Dataset of Pages from Early Printed Books with Multiple Font Groups", accepted at the 5th International Workshop on Historical Document Imaging and Processing, Sydney, Australia. We would like to thank the British Library (London), Bayerische Staatsbibliothek München, Staatsbibliothek zu Berlin, Universitätsbibliothek Erlangen, Universitätsbibliothek Heidelberg, Staats- und Universitäatsbibliothek Göttingen, Stadt- und Universitätsbibliothek Köln, Württembergische Landesbibliothek Stuttgart and Herzog August Bibliothek Wolfenbüttel for the data they sent us and kindly allowed us to use for this public dataset.

创建时间：

2021-09-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集