Austrian Baroque Corpus

SSH Open MarketPlace2023-10-17 更新2024-08-03 收录

下载链接：

https://marketplace.sshopencloud.eu/dataset/zUYGPU

下载链接

链接失效反馈

官方服务：

资源简介：

This historical corpus contains sermons from 1650 to 1750. For linguistic annotation, each individual token was automatically assigned to a morphosyntactic word class using the [TreeTagger](https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/) software. As a classification system, the 54-part Stuttgart-Tübingen TagSet ([STTS](https://homepage.ruhr-uni-bochum.de/Stephen.Berman/Korpuslinguistik/Tagsets-STTS.html)) was used. For lemmatization , a normalized basic word form was used for each token and the [Duden](http://www.duden.de/) and the [German dictionary by Jacob and Wilhelm Grimm](http://www.dwb.uni-trier.de/) were used as reference works. The part-of-speech tagging and lemmatization was then manually checked. The corpus is available through a dedicated concordancer.

这个历史语料库包含1650年至1750年间的布道文本。在语言标注环节，使用[TreeTagger](https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)软件将每个独立Token自动分配至形态句法词类。分类系统采用含54个类别的斯图加特-图宾根标记集（[STTS](https://homepage.ruhr-uni-bochum.de/Stephen.Berman/Korpuslinguistik/Tagsets-STTS.html)）。词形还原时，每个Token采用规范化基础词形，并以[杜登词典](http://www.duden.de/)和[雅各布·格林与威廉·格林德语词典](http://www.dwb.uni-trier.de/)作为参考工具书。随后对词性标注与词形还原结果进行人工校验。该语料库可通过专用语料索引工具访问。

创建时间：

2023-10-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集