five

Linguistically annotated multilingual comparable corpora of parliamentary debates in English ParlaMint-en.ana 3.0

收藏
SSH Open MarketPlace2025-07-04 更新2025-07-05 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/k9KO5q
下载链接
链接失效反馈
官方服务:
资源简介:
This corpus comprises linguistically annotated multilingual comparable corpora of parliamentary debates [ParlaMint.ana 3.0](http://hdl.handle.net/11356/1488) which were machine translated to English and the translation linguistically annotated. Except for the translation to English, small changes in the metadata and the absence of the British parliament corpus, the corpora included in this entry are in all respects identical to the source language corpora, i.e. the entry comprises the same 26 European parliamentary corpora, together with over 1.1 billion words. The translation to English was done with [EasyNMT](https://github.com/UKPLab/EasyNMT) with[OPUS-MT models](https://github.com/Helsinki-NLP/Opus-MT). Machine translation was done on the sentence level, and includes both speeches and transcriber notes, including headings. The linguistic annotation of the speeches, i.e. tokenisation, tagging with UD PoS and morphological features, lemmatisation, and NER annotation was done with [Stanza](https://stanfordnlp.github.io/stanza/), using the English language model. For NER the conll03 model with 4 NE classes was used. The corpus is available for download from the CLARIN.SI repository and for browsing through concordancers noSketchEngine and KonText.
创建时间:
2025-07-04
二维码
社区交流群
二维码
科研交流群
商业服务