CORDEX inflectional lookup data 1.0
收藏SSH Open MarketPlace2025-07-04 更新2025-07-05 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/kpVlNC
下载链接
链接失效反馈官方服务:
资源简介:
This lexicon consists of a pickled dictionary of 111,660 lemmas, and maps these lemmas to their corresponding word forms. This inflectional data lookup module serves as an optional component within the [cordex library](https://github.com/clarinsi/cordex/) that significantly improves the quality of the results.
Each word form in the dictionary is accompanied by its MULTEXT-East morphosytactic descriptions, relevant features (custom features extracted from morphosytactic descriptions with the help of [Conversion utilities tool](https://gitea.cjvt.si/generic/conversion_utils) and its frequency within the [Gigafida 2.0 corpus](http://hdl.handle.net/11356/1320), or Gigafida 1.0 when other information is unavailable. The dictionary is used to select the most frequent word form of a lemma that satisfies additional filtering conditions (ie. find the most utilized word form of lemma "centralen" in singular, i.e."centralni").
This resource is available for download from the CLARIN.SI repository.
创建时间:
2025-07-04



