The CLASSLA-StanfordNLP model for named entity recognition of non-standard Croatian 1.0
收藏hdl.handle.net2025-03-27 收录
下载链接:
http://hdl.handle.net/11356/1340
下载链接
链接失效反馈官方服务:
资源简介:
This model for named entity recognition of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1183), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1205). The training corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed.
本模型针对非标准克罗地亚语的命名实体识别功能系采用CLASSLA-StanfordNLP工具(https://github.com/clarinsi/classla-stanfordnlp)构建,该工具在hr500k训练语料库(http://hdl.handle.net/11356/1183)、ReLDI-NormTagNER-hr语料库(http://hdl.handle.net/11356/1241)以及ReLDI-NormTagNER-sr语料库(http://hdl.handle.net/11356/1240)的基础上进行训练,并利用CLARIN.SI-embed.hr词嵌入(http://hdl.handle.net/11356/1205)进行建模。此外,为处理缺失的重音符号,对语料库的部分内容进行了去重音处理并重复使用,以增强模型对缺失重音符号的识别能力。
提供机构:
hdl.handle.net



