five

iluvvatar/RuNNE

收藏
Hugging Face2023-03-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/iluvvatar/RuNNE
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ru multilinguality: - monolingual task_categories: - token-classification task_ids: - named-entity-recognition pretty_name: RuNNE --- # RuNNE dataset ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Structure](#dataset-structure) - [Citation Information](#citation-information) - [Contacts](#contacts) ## Dataset Description Part of NEREL dataset (https://arxiv.org/abs/2108.13112), a Russian dataset for named entity recognition and relation extraction, used in RuNNE (2022) competition (https://github.com/dialogue-evaluation/RuNNE). Entities may be nested (see https://arxiv.org/abs/2108.13112). Entity types list: * AGE * AWARD * CITY * COUNTRY * CRIME * DATE * DISEASE * DISTRICT * EVENT * FACILITY * FAMILY * IDEOLOGY * LANGUAGE * LAW * LOCATION * MONEY * NATIONALITY * NUMBER * ORDINAL * ORGANIZATION * PENALTY * PERCENT * PERSON * PRODUCT * PROFESSION * RELIGION * STATE_OR_PROVINCE * TIME * WORK_OF_ART ## Dataset Structure There are two "configs" or "subsets" of the dataset. Using `load_dataset('MalakhovIlya/RuNNE', 'ent_types')['ent_types']` you can download list of entity types ( Dataset({ features: ['type'], num_rows: 29 }) ) Using `load_dataset('MalakhovIlya/RuNNE', 'data')` or `load_dataset('MalakhovIlya/RuNNE')` you can download the data itself (DatasetDict) Dataset consists of 3 splits: "train", "test" and "dev". Each of them contains text document. "Train" and "test" splits also contain annotated entities, "dev" doesn't. Each entity is represented by a string of the following format: "\<start> \<stop> \<type>", where \<start> is a position of the first symbol of entity in text, \<stop> is the last symbol position in text and \<type> is a one of the aforementioned list of types. P.S. Original NEREL dataset also contains relations, events and linked entities, but they were not added here yet ¯\\\_(ツ)_/¯ ## Citation Information @article{Artemova2022runne, title={{RuNNE-2022 Shared Task: Recognizing Nested Named Entities}}, author={Artemova, Ekaterina and Zmeev, Maksim and Loukachevitch, Natalia and Rozhkov, Igor and Batura, Tatiana and Braslavski, Pavel and Ivanov, Vladimir and Tutubalina, Elena}, journal={Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference "Dialog"}, year={2022} }
提供机构:
iluvvatar
原始信息汇总

RuNNE Dataset Summary

数据集描述

RuNNE数据集是NEREL数据集的一部分,专门用于俄语的命名实体识别和关系抽取。该数据集在2022年的RuNNE竞赛中使用。数据集中的实体可能嵌套存在。

实体类型列表

  • AGE
  • AWARD
  • CITY
  • COUNTRY
  • CRIME
  • DATE
  • DISEASE
  • DISTRICT
  • EVENT
  • FACILITY
  • FAMILY
  • IDEOLOGY
  • LANGUAGE
  • LAW
  • LOCATION
  • MONEY
  • NATIONALITY
  • NUMBER
  • ORDINAL
  • ORGANIZATION
  • PENALTY
  • PERCENT
  • PERSON
  • PRODUCT
  • PROFESSION
  • RELIGION
  • STATE_OR_PROVINCE
  • TIME
  • WORK_OF_ART

数据集结构

数据集分为三个部分:训练集、测试集和开发集。训练集和测试集包含标注的实体,开发集不包含。每个实体的格式为:“<起始位置> <结束位置> <类型>”。

引用信息

@article{Artemova2022runne, title={{RuNNE-2022 Shared Task: Recognizing Nested Named Entities}}, author={Artemova, Ekaterina and Zmeev, Maksim and Loukachevitch, Natalia and Rozhkov, Igor and Batura, Tatiana and Braslavski, Pavel and Ivanov, Vladimir and Tutubalina, Elena}, journal={Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference "Dialog"}, year={2022} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作