The CLASSLA-StanfordNLP model for named entity recognition of non-standard Serbian 1.0
收藏hdl.handle.net2025-03-27 收录
下载链接:
http://hdl.handle.net/11356/1341
下载链接
链接失效反馈官方服务:
资源简介:
This model for named entity recognition of non-standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200), the hr500k training corpus (http://hdl.handle.net/11356/1183), the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240) and the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), using the CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1206). The training corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed.
此非标准塞尔维亚语命名实体识别模型系采用CLASSLA-StanfordNLP工具(https://github.com/clarinsi/classla-stanfordnlp)构建而成,该工具通过在SETimes.SR训练语料库(http://hdl.handle.net/11356/1200)、hr500k训练语料库(http://hdl.handle.net/11356/1183)、ReLDI-NormTagNER-sr语料库(http://hdl.handle.net/11356/1240)以及ReLDI-NormTagNER-hr语料库(http://hdl.handle.net/11356/1241)上训练,并结合CLARIN.SI-embed.sr词嵌入(http://hdl.handle.net/11356/1206)而得。为处理缺失的重音符号,训练语料库的相应部分被重复使用,以去除其中的重音符号。
提供机构:
hdl.handle.net



