SBB/Colibri
收藏Hugging Face2025-07-15 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/SBB/Colibri
下载链接
链接失效反馈官方服务:
资源简介:
Colibri数据集包含了从19世纪儿童和青少年书籍中提取的53,533张插图,这些书籍出版于1800年至1925年之间。数据集还包括元数据和注释,旨在为研究目的和人工智能应用开发提供支持。该数据集由柏林国家图书馆的图书管理员和研究人员收集和整理,包含了各种语言的图像和文本,其中主要是德语。数据集在创意共享知识共享署名4.0国际许可下发布,并且可以通过DOI链接下载。README文件还提供了数据收集过程、数据集背后的原理、源数据以及创建数据集时涉及的预处理和清理步骤的详细信息。此外,它还详细说明了注释过程和数据集的结构,包括使用的文件格式和元数据标准的合规性。
The Colibri dataset consists of 53,533 illustrations extracted from 3,412 childrens and youth books published between 1800 and 1925. The dataset includes metadata and annotations and is intended for research and AI application development in the field of historical cultural data. Curated by librarians and researchers at the Berlin State Library, the dataset features images and text predominantly in German, with some content in other languages. Distributed under a Creative Commons Attribution 4.0 International license, the dataset is available for download via a DOI link. The README provides information on the data collection process, the rationale behind the dataset, source data, preprocessing and cleaning steps, annotation process, and dataset structure, including file formats and metadata standards compliance.
提供机构:
SBB



