five

NoSta-D -- Korpus von Nicht-Standardvarietäten des Deutschen

收藏
DataCite Commons2023-11-13 更新2024-07-13 收录
下载链接:
https://fdat.uni-tuebingen.de/records/3q119-kxk41
下载链接
链接失效反馈
官方服务:
资源简介:
Corpus of different varieties of German. The subcorpora are subsets of other corpora, specified in parentheses: 1.) historical data (Anselm Corpus), chat data (Dortmund Chat Corpus), learner data (Falko), spoken data (BeMaTaC), literary prose (Kafka); 2.) newspaper texts (TüBa-D/Z). The subcorpora chat, spoken data, prose, and newspaper consist of approximately 5,000 tokens each, historical data of 1,000 tokens, and learner data of 2,900 tokens. Each subcorpus is annotated with the following information: token and sentence boundaries; normalization; POS tags and dependency relations;  named entities; coreference.
提供机构:
University of Tübingen
创建时间:
2023-11-13
二维码
社区交流群
二维码
科研交流群
商业服务