leondz/wnut_17
收藏数据集概述
数据集名称
- 名称: WNUT 17
- 别名: wnut_17
数据集描述
- 任务: 识别新兴和罕见实体
- 语言: 英语(en)
- 许可证: CC-BY-4.0
- 数据来源: 原始数据
- 数据类型: 单语种
- 规模: 1K<n<10K
- 任务类别: 词元分类
- 任务ID: 命名实体识别
数据集结构
- 特征:
id: 字符串类型,示例IDtokens: 字符串序列,示例文本的词元ner_tags: 类别标签序列,词元的NER标签,使用IOB2格式
- 分割:
train: 3394个示例validation: 1009个示例test: 1287个示例
数据集创建
- 注释创建者: 众包
- 语言创建者: 发现
数据集使用注意事项
-
引用信息:
@inproceedings{derczynski-etal-2017-results, title = "Results of the {WNUT}2017 Shared Task on Novel and Emerging Entity Recognition", author = "Derczynski, Leon and Nichols, Eric and van Erp, Marieke and Limsopatham, Nut", booktitle = "Proceedings of the 3rd Workshop on Noisy User-generated Text", month = sep, year = "2017", address = "Copenhagen, Denmark", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/W17-4418", doi = "10.18653/v1/W17-4418", pages = "140--147", abstract = "This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarization), but recall on them is a real problem in noisy text - even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet {``}so.. kktny in 30 mins?!{} {--} even human experts find the entity {`}kktny{} hard to detect and resolve. The goal of this task is to provide a definition of emerging and of rare entities, and based on that, also datasets for detecting these entities. The task as described in this paper evaluated the ability of participating entries to detect and classify novel and emerging named entities in noisy text.", }




