Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 3.0
收藏SSH Open MarketPlace2025-07-04 更新2025-07-05 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/LfQtd2
下载链接
链接失效反馈官方服务:
资源简介:
ParlaMint is a multilingual set of comparable corpora containing parliamentary debates mostly starting at the end of 2015 and extending to mid 2022, with each corpus being between 9 and 125 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after October 2019), the pre-Covid period or the period after 24 February 2022.
The corpora have extensive meta-data about the speakers (name, gender, party affiliation, MP status), are structured into time-stamped terms, sessions and meetings, with each speech being marked by its speaker and their role (chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc.
The corpus is available for download from the CLARIN.SI repository and through the concordancer [noSketch Engine](https://www.clarin.si/ske/#open). Note that the version of the corpus without linguistic mark-up is available for download under a [separate CLARIN.SI entry](http://hdl.handle.net/11356/1486).
创建时间:
2025-07-04



