The CLASSLA-Stanza model for lemmatisation of non-standard Croatian 2.1
收藏SSH Open MarketPlace2023-10-13 更新2024-08-03 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/CLRmee
下载链接
链接失效反馈官方服务:
资源简介:
The model for lemmatisation of non-standard Croatian was built with the [CLASSLA-Stanza tool](https://github.com/clarinsi/classla) by training on the [hr500k training corpus](http://hdl.handle.net/11356/1792) and the [ReLDI-NormTagNER-hr corpus](http://hdl.handle.net/11356/1793), using the [hrLex inflectional lexicon](http://hdl.handle.net/11356/1232). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~94.23.
The model is available for download from the CLARIN.SI repository.
创建时间:
2023-10-13



