five

rajpurkar/squad_v2

收藏
Hugging Face2024-03-04 更新2024-04-19 收录
下载链接:
https://hf-mirror.com/datasets/rajpurkar/squad_v2
下载链接
链接失效反馈
官方服务:
资源简介:
斯坦福问答数据集(SQuAD)是一个阅读理解数据集,由众包工作者在一组维基百科文章上提出的问题组成,每个问题的答案是对应阅读段落中的一段文本,或者问题可能是无法回答的。SQuAD 2.0结合了SQuAD1.1中的100,000个问题和超过50,000个由众包工作者编写的无法回答的问题,这些问题看起来与可回答的问题相似。要在SQuAD2.0上表现良好,系统不仅需要在可能的情况下回答问题,还需要确定何时段落不支持答案并避免回答。

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset composed of questions crowdsourced by workers on a set of Wikipedia articles, where the answer to each question is either a segment of text from the corresponding reading passage, or the question may be unanswerable. SQuAD 2.0 combines the 100,000 questions from SQuAD 1.1 with over 50,000 unanswerable questions written by crowdsourced workers, which are similar in appearance to answerable ones. To perform well on SQuAD 2.0, a system must not only answer questions when possible, but also identify when a passage does not support a valid answer and refrain from answering.
提供机构:
rajpurkar
原始信息汇总

数据集概述

名称: SQuAD2.0 语言: 英语 (en) 许可证: CC BY-SA 4.0 多语言性: 单语 大小: 100K<n<1M 来源: 原始数据 任务类别: 问答 任务ID:

  • open-domain-qa
  • extractive-qa 论文代码ID: squad

数据集结构

数据实例

  • 下载大小: 46.49 MB
  • 生成数据集大小: 128.52 MB
  • 总磁盘使用: 175.02 MB

数据字段

  • id: 字符串
  • title: 字符串
  • context: 字符串
  • question: 字符串
  • answers: 字典,包含:
    • text: 字符串
    • answer_start: 整数32位

数据分割

名称 训练 验证
squad_v2 130319 11873

数据集创建

许可证信息

  • 许可证: CC BY-SA 4.0

引用信息

@inproceedings{rajpurkar-etal-2018-know, title = "Know What You Don{}t Know: Unanswerable Questions for {SQ}u{AD}", author = "Rajpurkar, Pranav and Jia, Robin and Liang, Percy", editor = "Gurevych, Iryna and Miyao, Yusuke", booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)", month = jul, year = "2018", address = "Melbourne, Australia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P18-2124", doi = "10.18653/v1/P18-2124", pages = "784--789", eprint={1806.03822}, archivePrefix={arXiv}, primaryClass={cs.CL} } @inproceedings{rajpurkar-etal-2016-squad, title = "{SQ}u{AD}: 100,000+ Questions for Machine Comprehension of Text", author = "Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy", editor = "Su, Jian and Duh, Kevin and Carreras, Xavier", booktitle = "Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2016", address = "Austin, Texas", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/D16-1264", doi = "10.18653/v1/D16-1264", pages = "2383--2392", eprint={1606.05250}, archivePrefix={arXiv}, primaryClass={cs.CL}, }

搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
SQuAD 2.0是一个英语阅读理解数据集,包含可回答和无法回答的问题,旨在测试系统在回答问题时的准确性和判断力。数据集结合了SQuAD1.1的问题和新增的无法回答的问题,要求系统能够区分何时段落中没有答案支持。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作