DFKI-SLT/few-nerd
收藏Hugging Face2025-10-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/DFKI-SLT/few-nerd
下载链接
链接失效反馈官方服务:
资源简介:
Few-NERD是一个大规模、细粒度的手动标注命名实体识别数据集,包含8个粗粒度类型和66个细粒度类型,共有188,200个句子、491,711个实体和4,601,223个标记。数据集支持监督学习和少样本学习任务,包括Few-NERD (SUP)、Few-NERD (INTRA)和Few-NERD (INTER)三个基准任务。数据集的标注由专家生成,语言为英语,许可证为CC BY-SA 4.0。
提供机构:
DFKI-SLT
原始信息汇总
数据集概述
数据集名称
- 名称: Few-NERD
- 别名: Few-NERD (SUP), Few-NERD (INTRA), Few-NERD (INTER)
数据集描述
- 概述: Few-NERD是一个大规模、细粒度手动标注的命名实体识别数据集,包含8个粗粒度类型,66个细粒度类型,188,200个句子,491,711个实体,以及4,601,223个标记。
- 任务: 命名实体识别(NER),少样本NER
- 语言: 英语
- 许可证: CC BY-SA 4.0
- 数据集大小: 100K<n<1M
- 源数据: 扩展自Wikipedia
- 任务类别: 令牌分类
- 标签方案: IO标签方案
数据集结构
- 数据实例:
- 包含id, tokens, ner_tags, fine_ner_tags等字段
- 示例: {id: 1, tokens: [It, starred, Hicks, "s", wife, ,, Ellaline, Terriss, and, Edmund, Payne, .], ner_tags: [0, 0, 7, 0, 0, 0, 7, 7, 0, 7, 7, 0], fine_ner_tags: [0, 0, 51, 0, 0, 0, 50, 50, 0, 50, 50, 0]}
- 数据分割:
- SUP: 训练131767, 验证18824, 测试37648
- INTRA: 训练99519, 验证19358, 测试44059
- INTER: 训练130112, 验证18817, 测试14007
数据集创建
-
许可证信息: CC BY-SA 4.0
-
引用信息:
@inproceedings{ding-etal-2021-nerd, title = "Few-{NERD}: A Few-shot Named Entity Recognition Dataset", author = "Ding, Ning and Xu, Guangwei and Chen, Yulin and Wang, Xiaobin and Han, Xu and Xie, Pengjun and Zheng, Haitao and Liu, Zhiyuan", booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.acl-long.248", doi = "10.18653/v1/2021.acl-long.248", pages = "3198--3213", }
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



