The CLASSLA-StanfordNLP model for named entity recognition of non-standard Serbian 1.0

Name: The CLASSLA-StanfordNLP model for named entity recognition of non-standard Serbian 1.0
Creator: hdl.handle.net
License: 暂无描述

hdl.handle.net2025-03-27 收录

下载链接：

http://hdl.handle.net/11356/1341

下载链接

链接失效反馈

官方服务：

资源简介：

This model for named entity recognition of non-standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200), the hr500k training corpus (http://hdl.handle.net/11356/1183), the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240) and the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), using the CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1206). The training corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed.

此非标准塞尔维亚语命名实体识别模型系采用CLASSLA-StanfordNLP工具（https://github.com/clarinsi/classla-stanfordnlp）构建而成，该工具通过在SETimes.SR训练语料库（http://hdl.handle.net/11356/1200）、hr500k训练语料库（http://hdl.handle.net/11356/1183）、ReLDI-NormTagNER-sr语料库（http://hdl.handle.net/11356/1240）以及ReLDI-NormTagNER-hr语料库（http://hdl.handle.net/11356/1241）上训练，并结合CLARIN.SI-embed.sr词嵌入（http://hdl.handle.net/11356/1206）而得。为处理缺失的重音符号，训练语料库的相应部分被重复使用，以去除其中的重音符号。

提供机构：

hdl.handle.net

5,000+

优质数据集

54 个

任务类型

进入经典数据集