RONEC
收藏arXiv2020-04-28 更新2024-06-21 收录
下载链接:
https://github.com/dumitrescustefan/ronec
下载链接
链接失效反馈官方服务:
资源简介:
RONEC是罗马尼亚语的命名实体语料库,由布加勒斯特理工大学创建。该数据集包含5127个句子,共26377个命名实体,涵盖16种不同的实体类别。数据来源于版权免费的报纸文本,旨在解决罗马尼亚语在命名实体识别领域的资源匮乏问题。数据集创建过程中,通过多次迭代和讨论,不断优化标注指南,最终形成高质量的标注结果。RONEC的应用领域广泛,可用于信息提取、机器翻译等多种自然语言处理任务。
RONEC is a Romanian named entity corpus created by the Polytechnic University of Bucharest. This dataset contains 5,127 sentences and a total of 26,377 named entities, covering 16 distinct entity categories. The data is sourced from copyright-free newspaper texts, aiming to address the shortage of resources for Romanian in the field of named entity recognition (NER). During the dataset construction process, the annotation guidelines were continuously optimized through multiple iterations and discussions, ultimately resulting in high-quality annotated results. RONEC has a wide range of application scenarios and can be used for various natural language processing (NLP) tasks such as information extraction and machine translation.
提供机构:
布加勒斯特理工大学
创建时间:
2019-09-03



