NoSta-D -- Korpus von Nicht-Standardvarietäten des Deutschen
收藏DataCite Commons2023-11-13 更新2024-07-13 收录
下载链接:
https://fdat.uni-tuebingen.de/records/3q119-kxk41
下载链接
链接失效反馈官方服务:
资源简介:
Corpus of different varieties of German. The subcorpora are subsets of other corpora, specified in parentheses: 1.) historical data (Anselm Corpus), chat data (Dortmund Chat Corpus), learner data (Falko), spoken data (BeMaTaC), literary prose (Kafka); 2.) newspaper texts (TüBa-D/Z). The subcorpora chat, spoken data, prose, and newspaper consist of approximately 5,000 tokens each, historical data of 1,000 tokens, and learner data of 2,900 tokens.
Each subcorpus is annotated with the following information: token and sentence boundaries; normalization; POS tags and dependency relations; named entities; coreference.
提供机构:
University of Tübingen
创建时间:
2023-11-13



