five

FEW-NERD

收藏
arXiv2021-09-01 更新2024-06-21 收录
下载链接:
https://ningding97.github.io/fewnerd/
下载链接
链接失效反馈
官方服务:
资源简介:
FEW-NERD是一个大规模的人工标注少样本命名实体识别数据集,由清华大学和阿里巴巴集团合作创建。该数据集包含188,238个句子,总计4,601,160个单词,每个单词都被标注为上下文或两级实体类型的一部分。FEW-NERD是首个专门为少样本NER设计的数据集,也是最大的人工标注NER数据集之一。数据集的内容主要来源于维基百科,涵盖了8个粗粒度和66个细粒度实体类型,旨在通过丰富的实体类型和上下文信息,全面评估模型的泛化能力。该数据集的应用领域包括信息提取、知识图谱构建等,旨在解决在有限样本情况下识别新实体类型的问题。

FEW-NERD is a large-scale manually annotated few-shot named entity recognition (NER) dataset co-developed by Tsinghua University and Alibaba Group. It contains 188,238 sentences totaling 4,601,160 words, with each word annotated as part of either a contextual segment or a two-level entity type. FEW-NERD is the first dataset specifically tailored for few-shot NER, and also ranks among the largest manually annotated NER datasets globally. Its content is primarily sourced from Wikipedia, covering 8 coarse-grained and 66 fine-grained entity types, and it is designed to comprehensively evaluate the generalization capability of models via rich entity types and contextual information. The dataset can be applied in domains such as information extraction and knowledge graph construction, aiming to address the challenge of recognizing novel entity types under limited sample conditions.
提供机构:
清华大学
创建时间:
2021-05-16
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作