five

German Parliamentary Corpus (GerParCor)

收藏
SSH Open MarketPlace2024-09-30 更新2024-10-05 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/TCOYFe
下载链接
链接失效反馈
官方服务:
资源简介:
This corpus contains (mostly historical) German-language parliamentary proceedings from the 19th, 20th, and 21th centuries, including state and federal-level data. Additionally, the corpus contains conversions of scanned protocols and, in particular, of protocols in [Fraktur](https://en.wikipedia.org/wiki/Fraktur) converted via an OCR process based on [Tesseract](https://github.com/tesseract-ocr/tesseract). All protocols were preprocessed by means of the NLP pipeline [spaCy v3](https://spacy.io/usage/v3/) and automatically annotated with metadata regarding their session date. The corpus is made available in the XML format of the [UIMA project](https://uima.apache.org/). The corpus is available for download from GitHub.
创建时间:
2024-09-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作