Morphological lexicon Sloleks 2.0
收藏hdl.handle.net2025-01-15 收录
下载链接:
http://hdl.handle.net/11356/1230
下载链接
链接失效反馈官方服务:
资源简介:
Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains approx. 100,000 most frequent Slovenian lemmas, their inflected or derivative word forms and the corresponding grammatical description. Lemmatization rules, part-of-speech categorization and the set of feature-value pairs follow the JOS morphosyntactic specifications. In addition to grammatical information, each word form is also given the information on its absolute corpus frequency and its compliance with the reference language standard.
Sloleks 2.0 includes accents automatically assigned by the use of neural networks (Krsnik 2017) and partially manually corrected, as well as automatically generated IPA and SAMPA transcriptions on lemmas and word-forms.
The canonical version is encoded in XML, against the Sloleks LMF DTD. The resource is also available as a TSV file in the MULTEXT-East format, with wordform, lemma, MSD and frequency columns, also mapped to Universal Dependencies features.
References:
Kaja Dobrovoljc, Simon Krek and Tomaž Erjavec, 2017: The Sloleks Morphological Lexicon and its Future Development. In (Vojko Gorjanc, Polona Gantar, Iztok Kosem and Simon Krek, eds.): Dictionary of Modern Slovene: Problems and Solutions. Ljubljana University Press, Faculty of Arts. https://ebooks.uni-lj.si/ZalozbaUL/catalog/view/2/1/47
Krsnik, Luka. Napovedovanje naglasa slovenskih besed z metodami strojnega učenja: magistrsko delo: magistrski program druge stopnje Računalništvo in informatika. Ljubljana: [L. Krsnik], 2017. http://eprints.fri.uni-lj.si/3978/
Sloleks 为斯洛文尼亚语言的参考形态学词汇库,旨在应用于自然语言处理应用及语言手册中。该词汇库采用 LMF XML 编码,包含约十万条最频繁出现的斯洛文尼亚词干及其屈折或衍生词形,以及相应的语法描述。词干化规则、词性分类及特征值对集合遵循 JOS 形态句法规范。除了语法信息外,每个词形还提供了其在语料库中的绝对频率及其符合参考语言标准的详细信息。
Sloleks 2.0 包含了通过神经网络自动分配的变音符号(Krsnik 2017),部分手动校正,以及针对词干和词形的自动生成的 IPA 和 SAMPA 语音转写。
标准版本采用 XML 编码,遵循 Sloleks LMF DTD。资源亦以 MULTEXT-East 格式提供 TSV 文件,包含词形、词干、MSD 和频率列,并映射到通用依存关系特征。
参考文献:
Kaja Dobrovoljc, Simon Krek 和 Tomaž Erjavec, 2017: 《Sloleks 形态学词汇库及其未来发展》。收录于(Vojko Gorjanc, Polona Gantar, Iztok Kosem 和 Simon Krek 编著):现代斯洛文尼亚词典:问题与解决方案。里耶卡大学出版社,文学院。https://ebooks.uni-lj.si/ZalozbaUL/catalog/view/2/1/47
Krsnik, Luka. 使用机器学习方法预测斯洛文尼亚单词重音:硕士学位论文:第二级计算机科学和信息技术硕士课程。里耶卡:[L. Krsnik],2017。http://eprints.fri.uni-lj.si/3978/
提供机构:
hdl.handle.net



