rongzhangibm/NaturalQuestionsV2

Name: rongzhangibm/NaturalQuestionsV2
Creator: rongzhangibm
Published: 2022-07-07 05:22:20
License: 暂无描述

Hugging Face2022-07-07 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/rongzhangibm/NaturalQuestionsV2

下载链接

链接失效反馈

官方服务：

资源简介：

NQ语料库包含来自真实用户的问题，要求问答系统阅读并理解整个维基百科文章，以找到问题的答案。包含真实用户问题以及要求系统阅读整个页面以找到答案的特性，使得NQ成为一个比以往问答数据集更现实和更具挑战性的任务。

The Natural Questions (NQ) corpus contains questions from real users, and requires question answering systems to read and fully comprehend entire Wikipedia articles to locate the answers to these questions. The characteristics of including real user questions and mandating systems to find answers by reading full Wikipedia pages make NQ a more realistic and challenging task than previous question answering datasets.

提供机构：

rongzhangibm

原始信息汇总

数据集概述

数据集描述

数据集摘要

名称: Natural Questions
语言: 英语 (en)
许可证: Creative Commons Attribution-ShareAlike 3.0 Unported (cc-by-sa-3.0)
多语言性: 单语
大小: 100K<n<1M
任务类别: 问答 (question-answering)
任务ID: open-domain-qa

数据集结构

数据实例

下载大小: 42981 MB
生成数据集大小: 139706 MB
总磁盘使用量: 182687 MB

数据字段

id: 字符串
document:
- title: 字符串
- url: 字符串
- html: 字符串
- tokens: 序列，包含以下字段:
  - token: 字符串
  - is_html: 布尔值
  - start_byte: 整数
  - end_byte: 整数
question:
- text: 字符串
- tokens: 序列，包含字符串
long_answer_candidates: 序列，包含以下字段:
- start_token: 整数
- end_token: 整数
- start_byte: 整数
- end_byte: 整数
- top_level: 布尔值
annotations: 序列，包含以下字段:
- id: 字符串
- long_answer:
  - start_token: 整数
  - end_token: 整数
  - start_byte: 整数
  - end_byte: 整数
  - candidate_index: 整数
- short_answers: 序列，包含以下字段:
  - start_token: 整数
  - end_token: 整数
  - start_byte: 整数
  - end_byte: 整数
  - text: 字符串
- yes_no_answer: 类别标签，包含 ["NO", "YES"]

数据分割

名称	训练	验证
default	307373	7830
dev	N/A	7830

许可证信息

许可证: Creative Commons Attribution-ShareAlike 3.0 Unported

引用信息

@article{47761, title = {Natural Questions: a Benchmark for Question Answering Research}, author = {Tom Kwiatkowski and Jennimaria Palomaki and Olivia Redfield and Michael Collins and Ankur Parikh and Chris Alberti and Danielle Epstein and Illia Polosukhin and Matthew Kelcey and Jacob Devlin and Kenton Lee and Kristina N. Toutanova and Llion Jones and Ming-Wei Chang and Andrew Dai and Jakob Uszkoreit and Quoc Le and Slav Petrov}, year = {2019}, journal = {Transactions of the Association of Computational Linguistics} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集