The CLASSLA-StanfordNLP model for named entity recognition of non-standard Croatian 1.0

Name: The CLASSLA-StanfordNLP model for named entity recognition of non-standard Croatian 1.0
Creator: hdl.handle.net
License: 暂无描述

hdl.handle.net2025-03-27 收录

下载链接：

http://hdl.handle.net/11356/1340

下载链接

链接失效反馈

官方服务：

资源简介：

This model for named entity recognition of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1183), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1205). The training corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed.

本模型针对非标准克罗地亚语的命名实体识别功能系采用CLASSLA-StanfordNLP工具（https://github.com/clarinsi/classla-stanfordnlp）构建，该工具在hr500k训练语料库（http://hdl.handle.net/11356/1183）、ReLDI-NormTagNER-hr语料库（http://hdl.handle.net/11356/1241）以及ReLDI-NormTagNER-sr语料库（http://hdl.handle.net/11356/1240）的基础上进行训练，并利用CLARIN.SI-embed.hr词嵌入（http://hdl.handle.net/11356/1205）进行建模。此外，为处理缺失的重音符号，对语料库的部分内容进行了去重音处理并重复使用，以增强模型对缺失重音符号的识别能力。

提供机构：

hdl.handle.net

5,000+

优质数据集

54 个

任务类型

进入经典数据集