NoSta-D -- Korpus von Nicht-Standardvarietäten des Deutschen

Name: NoSta-D -- Korpus von Nicht-Standardvarietäten des Deutschen
Creator: University of Tübingen
Published: 2023-11-13 09:44:49
License: 暂无描述

DataCite Commons2023-11-13 更新2024-07-13 收录

下载链接：

https://fdat.uni-tuebingen.de/records/3q119-kxk41

下载链接

链接失效反馈

官方服务：

资源简介：

Corpus of different varieties of German. The subcorpora are subsets of other corpora, specified in parentheses: 1.) historical data (Anselm Corpus), chat data (Dortmund Chat Corpus), learner data (Falko), spoken data (BeMaTaC), literary prose (Kafka); 2.) newspaper texts (TüBa-D/Z). The subcorpora chat, spoken data, prose, and newspaper consist of approximately 5,000 tokens each, historical data of 1,000 tokens, and learner data of 2,900 tokens. Each subcorpus is annotated with the following information: token and sentence boundaries; normalization; POS tags and dependency relations; named entities; coreference.

提供机构：

University of Tübingen

创建时间：

2023-11-13