wnut17命名实体识别数据集
收藏魔搭社区2026-04-20 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/iic/wnut17_ner
下载链接
链接失效反馈官方服务:
资源简介:
# wnut17命名实体识别数据集
## 数据集概述
wnut17数据集是面向社交媒体的英文命名实体识别数据集。
### 数据集简介
本数据集包括训练集(3394)、验证集(1009)、测试集(1287),实体类型包括corporation, creative-work、group、location、person、product。
### 数据集的格式和结构
数据格式采用conll标准,数据分为两列,第一列是输入句中的词划分,第二列是每个词对应的命名实体类型标签。一个具体case的例子如下:
```
Visuals O
of O
the O
avalanche O
site O
in O
Gurez B-location
sector I-location
. O
```
## 数据集版权信息
Creative Commons Attribution 4.0 International。
## 引用方式
```bib
@inproceedings{derczynski-etal-2017-results,
title = "Results of the {WNUT}2017 Shared Task on Novel and Emerging Entity Recognition",
author = "Derczynski, Leon and
Nichols, Eric and
van Erp, Marieke and
Limsopatham, Nut",
booktitle = "Proceedings of the 3rd Workshop on Noisy User-generated Text",
month = sep,
year = "2017",
address = "Copenhagen, Denmark",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/W17-4418",
doi = "10.18653/v1/W17-4418",
pages = "140--147",
abstract = "This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions.
Named entities form the basis of many modern approaches to other tasks (like event clustering and summarization),
but recall on them is a real problem in noisy text - even among annotators.
This drop tends to be due to novel entities and surface forms.
Take for example the tweet {``}so.. kktny in 30 mins?!{''} {--} even human experts find the entity {`}kktny{'}
hard to detect and resolve. The goal of this task is to provide a definition of emerging and of rare entities,
and based on that, also datasets for detecting these entities. The task as described in this paper evaluated the
ability of participating entries to detect and classify novel and emerging named entities in noisy text.",
}
```
# WNUT17命名实体识别数据集
## 数据集概述
WNUT17数据集是面向社交媒体场景的英文命名实体识别数据集。
### 数据集简介
本数据集包含训练集(3394条样本)、验证集(1009条样本)与测试集(1287条样本),涵盖的实体类型包括公司(corporation)、创意作品(creative-work)、团体(group)、地点(location)、人物(person)与产品(product)。
### 数据集格式与结构
本数据集采用CoNLL(Conference on Computational Natural Language Learning)标准格式,数据分为两列:第一列为输入语句的分词结果,第二列为对应每个分词的命名实体类型标签。具体示例如下:
Visuals O
of O
the O
avalanche O
site O
in O
Gurez B-location
sector I-location
. O
## 数据集版权信息
本数据集采用知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International)。
## 引用方式
bib
@inproceedings{derczynski-etal-2017-results,
title = "Results of the {WNUT}2017 Shared Task on Novel and Emerging Entity Recognition",
author = "Derczynski, Leon and
Nichols, Eric and
van Erp, Marieke and
Limsopatham, Nut",
booktitle = "Proceedings of the 3rd Workshop on Noisy User-generated Text",
month = sep,
year = "2017",
address = "Copenhagen, Denmark",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/W17-4418",
doi = "10.18653/v1/W17-4418",
pages = "140--147",
abstract = "This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions.
Named entities form the basis of many modern approaches to other tasks (like event clustering and summarization),
but recall on them is a real problem in noisy text - even among annotators.
This drop tends to be due to novel entities and surface forms.
Take for example the tweet ``so.. kktny in 30 mins?!'' -- even human experts find the entity `kktny`
hard to detect and resolve. The goal of this task is to provide a definition of emerging and of rare entities,
and based on that, also datasets for detecting these entities. The task as described in this paper evaluated the
ability of participating entries to detect and classify novel and emerging named entities in noisy text.",
}
提供机构:
maas
创建时间:
2022-10-17
搜集汇总
数据集介绍

背景与挑战
背景概述
wnut17是一个专门针对社交媒体文本的英文命名实体识别数据集,包含训练集、验证集和测试集,实体类型涵盖公司、创意作品、团体、地点、人物和产品。该数据集采用CoNLL标准格式,每行包含单词标记及其对应的实体类型标签。
以上内容由遇见数据集搜集并总结生成



