de-Corp: A Corpus of German Fiction and Non-Fiction (1780-1930)
收藏Figshare2025-06-22 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/de-Corp_A_Corpus_of_German_Fiction_and_Non-Fiction_1780-1930_/29377511
下载链接
链接失效反馈官方服务:
资源简介:
de-Corp is a corpus of 6,500 German-language fiction and non-fiction texts published between 1780 and 1930, compiled from the German and U.S. Project Gutenberg libraries. It includes detailed metadata on genre, publication year, and author gender, offering over 18 million sentences across 1,400+ unique authors. Fiction comprises the majority, with sub-genre classification based on institutional standards (WGS). The dataset supports large-scale historical and literary analysis and is especially valuable for research in Computational Literary Studies and Computational Linguistics.
创建时间:
2025-06-22



