The CLASSLA-Stanza model for lemmatisation of non-standard Slovenian 2.1

SSH Open MarketPlace2023-10-13 更新2024-08-03 收录

下载链接：

https://marketplace.sshopencloud.eu/dataset/es6XH9

下载链接

链接失效反馈

官方服务：

资源简介：

The model for lemmatisation of non-standard Slovenian was built with the [CLASSLA-Stanza tool](https://github.com/clarinsi/classla) by training on the [SUK training corpus](http://hdl.handle.net/11356/1747) and on the [Janes-Tag corpus](http://hdl.handle.net/11356/1732) using the [CLARIN.SI-embed.sl word embeddings](http://hdl.handle.net/11356/1204) expanded with the [MaCoCu-sl Slovene web corpus](http://hdl.handle.net/11356/1517). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~91.45. The model is available for download from the CLARIN.SI repository.

创建时间：

2023-10-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集