five

AustroParl Corpus of Parliamentary Debates

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3819504
下载链接
链接失效反馈
官方服务:
资源简介:
The AustroParl Corpus of Parliamentary Debates, prepared in the PolMine Project, comprises all protocols of plenary sessions in the Austrian Nationalrat between 1996 and 2019. The corpus is built based on pdf documents issued by the Nationalrat. The R package frappp has been used to extract structural information from the orginal text and to prepare an XML version of the corpus (preliminary TEI format). The structural annotation comprises speaker, party affiliation, parliamentary group affiliation, role, legislative period, session, date, interjections, year and agenda item. This release offers a linguistically annotated and indexed format of the corpus. As part of the corpus preparation pipeline, the data has been linguistically annotated (using the TreeTagger and StanfordNLP) and imported into the Corpus Workbench (CWB). The linguistic annotation comprises POS-tagging and lemmatization. This language resource is still very much in development and comes without any guarantees.
创建时间:
2020-05-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作