storytracer/German-PD-Newspapers

Name: storytracer/German-PD-Newspapers
Creator: storytracer
Published: 2024-03-20 17:09:17
License: 暂无描述

Hugging Face2024-03-20 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/storytracer/German-PD-Newspapers

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc0-1.0 task_categories: - text-generation language: - de tags: - newspapers - ocr - public domain pretty_name: Public Domain Newspapers (German) size_categories: - 10B<n<100B --- # Dataset Card for Public Domain Newspapers (German)  This dataset contains 13 billion words of OCR text extracted from German historical newspapers. ## Dataset Details ### Dataset Description  - **Curated by:** [Sebastian Majstorovic](https://www.storytracer.org) - **Language(s) (NLP):** German - **License:** Dataset: CC0, Texts: Public Domain ### Dataset Sources [optional]  - **Repository:** https://www.deutsche-digitale-bibliothek.de/newspaper ### Copyright & License The newspapers texts have been determined to be in the Public Domain by the institutions who provided them to the newspaper portal of the German Digital National Library. The dataset itself, excluding the texts, is licensed under the [CC0 license](https://creativecommons.org/public-domain/cc0/).

提供机构：

storytracer

原始信息汇总

数据集卡片：公共领域报纸（德语）

数据集概述

该数据集包含从德语历史报纸中提取的130亿字OCR文本。

数据集详情

数据集描述

策划者： Sebastian Majstorovic
语言（NLP）： 德语
许可证： 数据集：CC0，文本：公共领域

数据集来源 [可选]

存储库： https://www.deutsche-digitale-bibliothek.de/newspaper

版权与许可证

报纸文本已被提供给德国数字国家图书馆报纸门户的机构确定为公共领域。数据集本身（不包括文本）根据CC0许可证授权。

5,000+

优质数据集

54 个

任务类型

进入经典数据集