five

RuCoCo

收藏
arXiv2022-06-10 更新2024-06-21 收录
下载链接:
https://github.com/vdobrovolskii/rucoco
下载链接
链接失效反馈
官方服务:
资源简介:
RuCoCo是一个专为俄语设计的核心参考标注数据集,由ABBYY和莫斯科物理技术学院、俄罗斯国立人文大学合作创建。该数据集包含约100万词和15万次提及,主要来源于新闻文本,部分文本从零开始标注,其余则由机器生成标注后经人工校正。RuCoCo旨在通过大规模标注文本和保持高注释者一致性来支持核心参考解决研究。数据集适用于俄语自然语言处理,特别是核心参考和指代消解任务,为俄语语言处理技术的发展提供重要资源。

RuCoCo is a coreference annotation dataset specifically designed for the Russian language, co-created by ABBYY, Moscow Institute of Physics and Technology (MIPT), and Russian State University for the Humanities (RSUH). The dataset contains approximately 1 million tokens and 150,000 mentions, primarily sourced from news texts. Some texts were annotated from scratch, while the rest were first machine-annotated and then manually corrected. RuCoCo aims to support coreference resolution research through large-scale annotated corpora and high inter-annotator agreement. This dataset is tailored for Russian natural language processing (NLP), particularly coreference and anaphora resolution tasks, and provides a critical resource for the advancement of Russian language processing technologies.
提供机构:
ABBYY 和莫斯科物理技术学院,俄罗斯国立人文大学
创建时间:
2022-06-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作