Semantically tagged Finnish parliament discussions 1991-2015
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7344944
下载链接
链接失效反馈官方服务:
资源简介:
Semantically tagged Finnish parliament discussions 1991-2015
The original data is this (Rauh et al, 2017):
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/E4RSP9
We have imported the raw text out of the original data set without speaker and party tags. The Finnish text has been first tagged with UD2 parser using the Mylly service of the Language Bank of Finland. After UD2 parse, semantic tags have been added to the text with FiST (Kettunen, 2019).
Output form
# newdoc
# newpar
# sent_id = 1
# text = Arvoisa herra puhemies!
Arvoisa arvoisa Z99 amod
herra#herra#Noun#S2.2m S9 compound:nn
puhemies#puhemies#Noun#G1.1/S2 root
! PUNCT
The output consists of the 1) token word form of the text, 2) lemma of the token, 3) POS, 4) semantic tag and 5) UD2 grammatical relation of the word in the sentence.
Our semantic tagger does not resolve ambiguity. If the lexeme has multiple possible semantic tags, they are all included in the output (e.g., herra#herra#Noun#S2.2m S9 compound:nn). Slash notation in the semantic tags (e.g., puhemies#puhemies#Noun#G1.1/S2) indicates that the word can belong to two or more categories (Löfbeg, 2017). If the semantic category of the word is not recognized, the word is tagged with Z99.
The data consists of 4 036 269 sentences and ca. 65.247 million words. Lexical coverage of FiST for the data is 86.05 %, i.e. 86% of the words are known for the tagger and marked with a semantic tag.
References
Kettunen, Kimmo (2019). FiST – towards a Free Semantic Tagger of Modern Standard Finnish. IWCLUL2019, http://aclweb.org/anthology/W19-0306
Rauh, Christian; De Wilde, Pieter; Schwalbach, Jan, 2017, "Corp_Eduskundta.Rdata", The ParlSpeech data set: Annotated full-text vectors of 3.9 million plenary speeches in the key legislative chambers of seven European states, https://doi.org/10.7910/DVN/E4RSP9/U8VZHK, Harvard Dataverse, V1.
Lofberg, L. (2017). Creating large semantic lexical resources for the Finnish language. [Doctoral Thesis, Lancaster University]. Lancaster University. https://doi.org/10.17635/lancaster/thesis/3
UCREL Semantic Analysis System (USAS). https://ucrel.lancs.ac.uk/usas/
创建时间:
2022-12-05



