tner/tweebank_ner
收藏Hugging Face2022-11-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tner/tweebank_ner
下载链接
链接失效反馈官方服务:
资源简介:
TweeBank NER数据集是TNER项目的一部分,主要用于命名实体识别任务。该数据集包含来自Twitter的文本数据,定义了四种实体类型:LOC(地点)、MISC(杂项)、PER(人名)和ORG(组织)。数据集分为训练集、验证集和测试集,分别包含1639、710和1201个样本。
The TweeBank NER dataset is a component of the TNER project, primarily used for named entity recognition (NER) tasks. It consists of text data sourced from Twitter, and defines four entity types: LOC (Location), MISC (Miscellaneous), PER (Person), and ORG (Organization). The dataset is split into training, validation, and test sets, which contain 1639, 710, and 1201 samples respectively.
提供机构:
tner
原始信息汇总
数据集概述
数据集基本信息
- 名称: TweeBank NER
- 领域: Twitter
- 实体数量: 4
- 语言: 英语
- 多语言性: 单语
- 许可证: 其他
- 大小类别: 1k<10K
- 任务类别: 令牌分类
- 任务ID: 命名实体识别
数据集详情
数据集总结
- 格式: 基于TNER项目格式化
- 实体类型:
LOC,MISC,PER,ORG
数据集结构
数据实例
- 示例: json { tokens: [RT, @USER2362, :, Farmall, Heart, Of, The, Holidays, Tabletop, Christmas, Tree, With, Lights, And, Motion, URL1087, #Holiday, #Gifts], tags: [8, 8, 8, 2, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8] }
标签ID
- 标签映射: 可在此处找到。
数据分割
| 名称 | 训练 | 验证 | 测试 |
|---|---|---|---|
| tweebank_ner | 1639 | 710 | 1201 |
引用信息
@article{DBLP:journals/corr/abs-2201-07281, author = {Hang Jiang and Yining Hua and Doug Beeferman and Deb Roy}, title = {Annotating the Tweebank Corpus on Named Entity Recognition and Building {NLP} Models for Social Media Analysis}, journal = {CoRR}, volume = {abs/2201.07281}, year = {2022}, url = {https://arxiv.org/abs/2201.07281}, eprinttype = {arXiv}, eprint = {2201.07281}, timestamp = {Fri, 21 Jan 2022 13:57:15 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2201-07281.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



