storytracer/German-PD-Newspapers
收藏Hugging Face2024-03-20 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/storytracer/German-PD-Newspapers
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc0-1.0
task_categories:
- text-generation
language:
- de
tags:
- newspapers
- ocr
- public domain
pretty_name: Public Domain Newspapers (German)
size_categories:
- 10B<n<100B
---
# Dataset Card for Public Domain Newspapers (German)
<!-- Provide a quick summary of the dataset. -->
This dataset contains 13 billion words of OCR text extracted from German historical newspapers.
## Dataset Details
### Dataset Description
<!-- Provide a longer summary of what this dataset is. -->
- **Curated by:** [Sebastian Majstorovic](https://www.storytracer.org)
- **Language(s) (NLP):** German
- **License:** Dataset: CC0, Texts: Public Domain
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- **Repository:** https://www.deutsche-digitale-bibliothek.de/newspaper
### Copyright & License
The newspapers texts have been determined to be in the Public Domain by the institutions who provided them to the newspaper portal of the German Digital National Library. The dataset itself, excluding the texts, is licensed under the [CC0 license](https://creativecommons.org/public-domain/cc0/).
提供机构:
storytracer
原始信息汇总
数据集卡片:公共领域报纸(德语)
数据集概述
该数据集包含从德语历史报纸中提取的130亿字OCR文本。
数据集详情
数据集描述
- 策划者: Sebastian Majstorovic
- 语言(NLP): 德语
- 许可证: 数据集:CC0,文本:公共领域
数据集来源 [可选]
- 存储库: https://www.deutsche-digitale-bibliothek.de/newspaper
版权与许可证
报纸文本已被提供给德国数字国家图书馆报纸门户的机构确定为公共领域。数据集本身(不包括文本)根据CC0许可证授权。



