five

CORDEX inflectional lookup data 1.0

收藏
SSH Open MarketPlace2023-10-17 更新2024-08-03 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/QE1TyJ
下载链接
链接失效反馈
官方服务:
资源简介:
This lexicon consists of a pickled dictionary of 111,660 lemmas, and maps these lemmas to their corresponding word forms. This inflectional data lookup module serves as an optional component within the [cordex library](https://github.com/clarinsi/cordex/) that significantly improves the quality of the results. Each word form in the dictionary is accompanied by its MULTEXT-East morphosytactic descriptions, relevant features (custom features extracted from morphosytactic descriptions with the help of [Conversion utilities tool](https://gitea.cjvt.si/generic/conversion_utils) and its frequency within the [Gigafida 2.0 corpus](http://hdl.handle.net/11356/1320), or Gigafida 1.0 when other information is unavailable. The dictionary is used to select the most frequent word form of a lemma that satisfies additional filtering conditions (ie. find the most utilized word form of lemma "centralen" in singular, i.e."centralni"). This resource is available for download from the CLARIN.SI repository.
创建时间:
2023-10-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作