List of word relations from the Sloleks 2.0 lexicon 1.1
收藏hdl.handle.net2025-01-09 收录
下载链接:
http://hdl.handle.net/11356/1986
下载链接
链接失效反馈官方服务:
资源简介:
This entry consists of a TSV file containing a list of 66,347 Slovene word pairs from the Sloleks Morphological Lexicon of Slovene (v2.0; http://hdl.handle.net/11356/1230) that have been automatically identified as morphologically related according to a number of manually designed morphological relation rules (e.g. "dež" -> "deževen", "pisati" -> "pisatelj", "prijatelj" -> "prijateljica").
Each line in the list contains the following columns:
- original lemma (e.g. "pisati"),
- related lemma (e.g. "pisatelj"),
- original lemma, automatically deconstructed into individual word parts (e.g. "pis_ati"),
- related lemma, automatically deconstructed into individual word parts (e.g. "pis_at_elj"),
- MTE-6 lexical features of the original lemma (e.g. "G"),*
- MTE-6 lexical features of the related lemma (e.g. "Som"),*
- ID of the original lemma from Sloleks 2.0,
- ID of the related lemma from Sloleks 2.0,
- the overlapping or central part (common to both the original and the related lemmas; e.g. "pis")
- the ID of the morphological relation rule used to identify the relation (e.g. "G.Som.5.2.1"),
- the morphological relation rule (e.g. "[G]_ati -> [G]_at_elj").
* MTE-6 refers to MULTEXT-East Version 6 morphosyntactic specifications for Slovenian, available at http://nl.ijs.si/ME/V6/
Each rule constitutes a pattern to form a morphological relation. For instance, "[G]_ati -> [G]_at_elj" indicates that a verb (G) ending with the word part "ati" is related to the lemma formed by replacing "_ati" with "_at_elj".
Note that the list contains no proper nouns and no relations for 38 morphological rules that have been included in the hierarchy of rules (listed in the accompanying file nssss_sloleks_word_relation_rules.tsv), but need to take into account additional rules that have not yet been implemented in the current version of the extraction process (such as irregular conversions in overlapping word parts: "gri_sti" - "griz_enj_e", "sneg" - "snež_ak").
Version 1.1 also contains manual evaluation scores for approximately 5,000 pairs which were sampled in a stratified manner (by rules). The pairs were reviewed by a linguist and assigned one of three scores (0 - inadequate; 1 - acceptable; 2 - adequate).
本条目包含一个包含66,347个斯洛文尼亚语词对列表的TSV文件,这些词对来自斯洛文尼亚语Sloleks形态学词典(v2.0;http://hdl.handle.net/11356/1230)。根据一系列手工设计的形态学关系规则(例如:“dež” -> “deževen”,“pisati” -> “pisatelj”,“prijatelj” -> “prijateljica”),这些词对被自动识别为形态学相关。每一行列表包含以下列:
- 原始词干(例如:“pisati”),
- 相关词干(例如:“pisatelj”),
- 原始词干,自动分解为单个词素(例如:“pis_ati”),
- 相关词干,自动分解为单个词素(例如:“pis_at_elj”),
- 原始词干的MTE-6词汇特征(例如:“G”),*
- 相关词干的MTE-6词汇特征(例如:“Som”),*
- Sloleks 2.0中原始词干的ID,
- Sloleks 2.0中相关词干的ID,
- 两者共有的重叠或中心部分(例如:“pis”),
- 用于识别关系的形态学关系规则ID(例如:“G.Som.5.2.1”),
- 形态学关系规则(例如:“[G]_ati -> [G]_at_elj”)。*
MTE-6指的是针对斯洛文尼亚语的MULTEXT-East版本6形态句法规范,可在http://nl.ijs.si/ME/V6/找到。
每个规则构成一个形成形态学关系的模式。例如,“[G]_ati -> [G]_at_elj”表明以词素“ati”结尾的动词(G)与通过将“_ati”替换为“_at_elj”形成的词干相关。
请注意,列表中不包含专有名词,也不包含38个已包含在规则层次结构(详见附带文件nssss_sloleks_word_relation_rules.tsv)中的形态学关系,但需要考虑尚未在当前提取过程的当前版本中实现的附加规则(例如,重叠词素中的不规则变化:“gri_sti” - “griz_enj_e”,“sneg” - “snež_ak”)。
版本1.1还包含约5,000对样本的手动评估分数,这些样本以分层方式(按规则)抽取。语言学家对这些对进行了审查,并分配了三个分数之一(0 - 不充分;1 - 可接受;2 - 充分)。
提供机构:
hdl.handle.net



