five

CORDEX inflectional lookup data 1.0

收藏
SSH Open MarketPlace2025-07-04 更新2025-07-05 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/kpVlNC
下载链接
链接失效反馈
官方服务:
资源简介:
This lexicon consists of a pickled dictionary of 111,660 lemmas, and maps these lemmas to their corresponding word forms. This inflectional data lookup module serves as an optional component within the [cordex library](https://github.com/clarinsi/cordex/) that significantly improves the quality of the results. Each word form in the dictionary is accompanied by its MULTEXT-East morphosytactic descriptions, relevant features (custom features extracted from morphosytactic descriptions with the help of [Conversion utilities tool](https://gitea.cjvt.si/generic/conversion_utils) and its frequency within the [Gigafida 2.0 corpus](http://hdl.handle.net/11356/1320), or Gigafida 1.0 when other information is unavailable. The dictionary is used to select the most frequent word form of a lemma that satisfies additional filtering conditions (ie. find the most utilized word form of lemma "centralen" in singular, i.e."centralni"). This resource is available for download from the CLARIN.SI repository.
创建时间:
2025-07-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作