Cross-lingual Named Entity Corpus for Slavic Languages

Name: Cross-lingual Named Entity Corpus for Slavic Languages
Creator: 波兰科学院
Published: 2024-04-08 00:56:35
License: 暂无描述

arXiv2024-04-08 更新2024-06-21 收录

下载链接：

http://github.com/SlavicNLP/SlavicNER

下载链接

链接失效反馈

官方服务：

资源简介：

Cross-lingual Named Entity Corpus for Slavic Languages是一个为六种斯拉夫语言（保加利亚语、捷克语、波兰语、斯洛文尼亚语、俄语、乌克兰语）手动标注的实体语料库。该数据集由2017年至2023年的斯拉夫自然语言处理研讨会的一系列共享任务产生，包含5017份文档，涉及七个热门话题。文档中标注了五类实体：人名、组织、地点、事件和产品。每个实体都通过类别、词形和跨语言唯一标识符进行描述。数据集的应用领域包括实体识别、分类、词形化和链接，旨在促进斯拉夫语言的实体相关研究。

Cross-lingual Named Entity Corpus for Slavic Languages is a manually annotated entity corpus for six Slavic languages: Bulgarian, Czech, Polish, Slovenian, Russian, and Ukrainian. This dataset is derived from a series of shared tasks at Slavic Natural Language Processing workshops held between 2017 and 2023, containing 5017 documents covering seven prevalent topics. Five types of entities are annotated in the documents: person names, organizations, locations, events, and products. Each entity is specified by its category, lexical form, and a cross-lingual unique identifier. The dataset supports applications such as entity recognition, classification, lemmatization, and entity linking, with the objective of promoting entity-related research focused on Slavic languages.

提供机构：

波兰科学院

创建时间：

2024-03-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集