wnut17命名实体识别数据集

Name: wnut17命名实体识别数据集
Creator: maas
Published: 2026-04-20 18:24:01
License: 暂无描述

魔搭社区2026-04-20 更新2024-05-15 收录

下载链接：

https://modelscope.cn/datasets/iic/wnut17_ner

下载链接

链接失效反馈

官方服务：

资源简介：

# wnut17命名实体识别数据集 ## 数据集概述 wnut17数据集是面向社交媒体的英文命名实体识别数据集。 ### 数据集简介本数据集包括训练集（3394）、验证集（1009）、测试集（1287），实体类型包括corporation, creative-work、group、location、person、product。 ### 数据集的格式和结构数据格式采用conll标准，数据分为两列，第一列是输入句中的词划分，第二列是每个词对应的命名实体类型标签。一个具体case的例子如下： ``` Visuals O of O the O avalanche O site O in O Gurez B-location sector I-location . O ``` ## 数据集版权信息 Creative Commons Attribution 4.0 International。 ## 引用方式 ```bib @inproceedings{derczynski-etal-2017-results, title = "Results of the {WNUT}2017 Shared Task on Novel and Emerging Entity Recognition", author = "Derczynski, Leon and Nichols, Eric and van Erp, Marieke and Limsopatham, Nut", booktitle = "Proceedings of the 3rd Workshop on Noisy User-generated Text", month = sep, year = "2017", address = "Copenhagen, Denmark", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/W17-4418", doi = "10.18653/v1/W17-4418", pages = "140--147", abstract = "This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarization), but recall on them is a real problem in noisy text - even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet {``}so.. kktny in 30 mins?!{''} {--} even human experts find the entity {`}kktny{'} hard to detect and resolve. The goal of this task is to provide a definition of emerging and of rare entities, and based on that, also datasets for detecting these entities. The task as described in this paper evaluated the ability of participating entries to detect and classify novel and emerging named entities in noisy text.", } ```

# WNUT17命名实体识别数据集 ## 数据集概述 WNUT17数据集是面向社交媒体场景的英文命名实体识别数据集。 ### 数据集简介本数据集包含训练集（3394条样本）、验证集（1009条样本）与测试集（1287条样本），涵盖的实体类型包括公司（corporation）、创意作品（creative-work）、团体（group）、地点（location）、人物（person）与产品（product）。 ### 数据集格式与结构本数据集采用CoNLL（Conference on Computational Natural Language Learning）标准格式，数据分为两列：第一列为输入语句的分词结果，第二列为对应每个分词的命名实体类型标签。具体示例如下： Visuals O of O the O avalanche O site O in O Gurez B-location sector I-location . O ## 数据集版权信息本数据集采用知识共享署名4.0国际许可协议（Creative Commons Attribution 4.0 International）。 ## 引用方式 bib @inproceedings{derczynski-etal-2017-results, title = "Results of the {WNUT}2017 Shared Task on Novel and Emerging Entity Recognition", author = "Derczynski, Leon and Nichols, Eric and van Erp, Marieke and Limsopatham, Nut", booktitle = "Proceedings of the 3rd Workshop on Noisy User-generated Text", month = sep, year = "2017", address = "Copenhagen, Denmark", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/W17-4418", doi = "10.18653/v1/W17-4418", pages = "140--147", abstract = "This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarization), but recall on them is a real problem in noisy text - even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet ``so.. kktny in 30 mins?!'' -- even human experts find the entity `kktny` hard to detect and resolve. The goal of this task is to provide a definition of emerging and of rare entities, and based on that, also datasets for detecting these entities. The task as described in this paper evaluated the ability of participating entries to detect and classify novel and emerging named entities in noisy text.", }

提供机构：

maas

创建时间：

2022-10-17

搜集汇总

数据集介绍