five

thunlp/few_rel

收藏
Hugging Face2024-01-18 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/thunlp/few_rel
下载链接
链接失效反馈
官方服务:
资源简介:
FewRel是一个大规模的小样本关系抽取数据集,包含超过一百种关系和数万个跨不同领域的标注实例。数据集主要包含英文文本,来源于维基百科和众包英文注释。数据集的结构包括多个数据字段,如关系、文本标记、头实体、尾实体等,并且有多个数据分割,如train_wiki、val_nyt等。数据集的创建者包括Han, Xu等人,并且数据集遵循MIT许可证。

FewRel是一个大规模的小样本关系抽取数据集,包含超过一百种关系和数万个跨不同领域的标注实例。数据集主要包含英文文本,来源于维基百科和众包英文注释。数据集的结构包括多个数据字段,如关系、文本标记、头实体、尾实体等,并且有多个数据分割,如train_wiki、val_nyt等。数据集的创建者包括Han, Xu等人,并且数据集遵循MIT许可证。
提供机构:
thunlp
原始信息汇总

数据集概述

名称: Few-Shot Relation Classification Dataset (FewRel)

语言: 英语 (en)

许可证: MIT

多语言性: 单语 (monolingual)

大小:

  • 小于1K (<1K)
  • 10K至100K (10K<n<100K)

源数据: 原始 (original)

任务类别: 其他 (other)

配置名称:

  • default
  • pid2name

数据集结构

特征:

  • relation: 字符串类型,表示关系的PID。
  • tokens: 字符串序列,表示文本的词条。
  • head: 结构体,包含:
    • text: 字符串,表示头部实体。
    • type: 字符串,表示头部实体类型。
    • indices: 整数序列序列,表示词条索引。
  • tail: 结构体,包含:
    • text: 字符串,表示尾部实体。
    • type: 字符串,表示尾部实体类型。
    • indices: 整数序列序列,表示词条索引。
  • names: 字符串序列,表示关系名称。

数据分割:

  • train_wiki: 44800个样本
  • val_nyt: 2500个样本
  • val_pubmed: 1000个样本
  • val_semeval: 8851个样本
  • val_wiki: 11200个样本
  • pubmed_unsupervised: 2500个样本

下载大小: 22674323字节

数据集大小: 30708599字节 (default配置) / 81607字节 (pid2name配置)

数据集创建

注释创建者:

  • 众包 (crowdsourced)
  • 机器生成 (machine-generated)

语言创建者: 发现 (found)

许可证信息:

MIT License

Copyright (c) 2018 THUNLP

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

引用信息:

@inproceedings{han-etal-2018-fewrel, title = "{F}ew{R}el: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation", author = "Han, Xu and Zhu, Hao and Yu, Pengfei and Wang, Ziyun and Yao, Yuan and Liu, Zhiyuan and Sun, Maosong", booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing", month = oct # "-" # nov, year = "2018", address = "Brussels, Belgium", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D18-1514", doi = "10.18653/v1/D18-1514", pages = "4803--4809" }

@inproceedings{gao-etal-2019-fewrel, title = "{F}ew{R}el 2.0: Towards More Challenging Few-Shot Relation Classification", author = "Gao, Tianyu and Han, Xu and Zhu, Hao and Liu, Zhiyuan and Li, Peng and Sun, Maosong and Zhou, Jie", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)", month = nov, year = "2019", address = "Hong Kong, China", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D19-1649", doi = "10.18653/v1/D19-1649", pages = "6251--6256" }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作