five

storytracer/German-PD-Newspapers

收藏
Hugging Face2024-03-20 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/storytracer/German-PD-Newspapers
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc0-1.0 task_categories: - text-generation language: - de tags: - newspapers - ocr - public domain pretty_name: Public Domain Newspapers (German) size_categories: - 10B<n<100B --- # Dataset Card for Public Domain Newspapers (German) <!-- Provide a quick summary of the dataset. --> This dataset contains 13 billion words of OCR text extracted from German historical newspapers. ## Dataset Details ### Dataset Description <!-- Provide a longer summary of what this dataset is. --> - **Curated by:** [Sebastian Majstorovic](https://www.storytracer.org) - **Language(s) (NLP):** German - **License:** Dataset: CC0, Texts: Public Domain ### Dataset Sources [optional] <!-- Provide the basic links for the dataset. --> - **Repository:** https://www.deutsche-digitale-bibliothek.de/newspaper ### Copyright & License The newspapers texts have been determined to be in the Public Domain by the institutions who provided them to the newspaper portal of the German Digital National Library. The dataset itself, excluding the texts, is licensed under the [CC0 license](https://creativecommons.org/public-domain/cc0/).
提供机构:
storytracer
原始信息汇总

数据集卡片:公共领域报纸(德语)

数据集概述

该数据集包含从德语历史报纸中提取的130亿字OCR文本。

数据集详情

数据集描述

  • 策划者: Sebastian Majstorovic
  • 语言(NLP): 德语
  • 许可证: 数据集:CC0,文本:公共领域

数据集来源 [可选]

  • 存储库: https://www.deutsche-digitale-bibliothek.de/newspaper

版权与许可证

报纸文本已被提供给德国数字国家图书馆报纸门户的机构确定为公共领域。数据集本身(不包括文本)根据CC0许可证授权。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作