five

Enriched Kotus word list

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4630519
下载链接
链接失效反馈
官方服务:
资源简介:
The so called Kotus word list consists of the words in the 1990's Perussanakirja (Basic dictionary of Finnish) and in its original form it is available here: https://kaino.kotus.fi/sanat/nykysuomi/ Here published version of the wordlist of 94 385 lexemes is a modification, that combines information from two sources: UD1 (Universal Dependency Parser) of the Turku NLP group: analysis runs were performed in The Language Bank of Finland Semantic tags based on the UCREL Finnish semantic tag system: https://github.com/UCREL/Multilingual-USAS/tree/master/Finnish with the FiST semantic tagger  If the word has been tagged with the semantic tags by FiST, the output looks like this:  aakkonen Noun Q3  If the word was not analyzed by FiST, it is given its UD1 analysis and tag Z99:  aallokas NOUN§ Case=Nom|Number=Sing Z99  UD1 was able to analyze 39 524 of the compounds not analyzed by FiST to constituents. Constituent boundaries are marked with #:  aallonpituus aallon#pituus NOUN§ Case=Nom|Number=Sing Z99 Many times the constituent boundaries are right, but there are also missing boundaries and odd analyses. Lexical coverage of FiST with this data is low, 28.68%, due to the fact that the wordlist has about 52 269 compounds. Most of these are not included in the lexicon of FiST. They could, however, many times be analyzed based on their constituents.
创建时间:
2021-03-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作