five

Semantically tagged Finnish parliament discussions 1991-2015

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7344944
下载链接
链接失效反馈
官方服务:
资源简介:
Semantically tagged Finnish parliament discussions 1991-2015 The original data is this (Rauh et al, 2017): https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/E4RSP9 We have imported the raw text out of the original data set without speaker and party tags. The Finnish text has been first tagged with UD2 parser using the Mylly service of the Language Bank of Finland. After UD2 parse, semantic tags have been added to the text with FiST (Kettunen, 2019). Output form # newdoc # newpar # sent_id = 1 # text = Arvoisa herra puhemies! Arvoisa arvoisa Z99 amod herra#herra#Noun#S2.2m S9 compound:nn puhemies#puhemies#Noun#G1.1/S2 root ! PUNCT   The output consists of the 1) token word form of the text, 2) lemma of the token, 3) POS, 4) semantic tag and 5) UD2 grammatical relation of the word in the sentence. Our semantic tagger does not resolve ambiguity. If the lexeme has multiple possible semantic tags, they are all included in the output (e.g., herra#herra#Noun#S2.2m S9 compound:nn). Slash notation in the semantic tags (e.g., puhemies#puhemies#Noun#G1.1/S2) indicates that the word can belong to two or more categories (Löfbeg, 2017). If the semantic category of the word is not recognized, the word is tagged with Z99. The data consists of 4 036 269 sentences and ca. 65.247 million words. Lexical coverage of FiST for the data is 86.05 %, i.e. 86% of the words are known for the tagger and marked with a semantic tag. References Kettunen, Kimmo (2019). FiST – towards a Free Semantic Tagger of Modern Standard Finnish. IWCLUL2019, http://aclweb.org/anthology/W19-0306 Rauh, Christian; De Wilde, Pieter; Schwalbach, Jan, 2017, "Corp_Eduskundta.Rdata", The ParlSpeech data set: Annotated full-text vectors of 3.9 million plenary speeches in the key legislative chambers of seven European states, https://doi.org/10.7910/DVN/E4RSP9/U8VZHK, Harvard Dataverse, V1. Lofberg, L. (2017). Creating large semantic lexical resources for the Finnish language. [Doctoral Thesis, Lancaster University]. Lancaster University. https://doi.org/10.17635/lancaster/thesis/3 UCREL Semantic Analysis System (USAS). https://ucrel.lancs.ac.uk/usas/
创建时间:
2022-12-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作