Natural_Questions

Name: Natural_Questions
Creator: maas
Published: 2025-10-16 20:06:54
License: 暂无描述

魔搭社区2025-10-16 更新2024-08-31 收录

下载链接：

https://modelscope.cn/datasets/OmniData/Natural_Questions

下载链接

链接失效反馈

官方服务：

资源简介：

displayName: Natural Questions labelTypes: - English Corpus license: - CC BY-SA 3.0 mediaTypes: - Text paperUrl: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/b8c26e4347adc3453c15d96a09e6f7f102293f71.pdf publishDate: "2019" publishUrl: https://ai.google.com/research/NaturalQuestions publisher: - Google Research tags: - Essay taskTypes: - Question Generation - Open-Domain Question Answering - Visual Question Answering --- # 数据集介绍 ## 简介自然问题语料库是一个问答数据集，包含 307,373 个训练示例、7,830 个开发示例和 7,842 个测试示例。每个示例都由 google.com 查询和相应的 Wikipedia 页面组成。每个 Wikipedia 页面都有一个在回答问题的页面上注释的段落（或长答案），以及包含实际答案的注释段落的一个或多个短跨度。然而，长答案和短答案注释可以是空的。如果它们都是空的，那么页面上根本没有答案。如果长答案注释不为空，而短答案注释为空，则注释的段落回答了问题，但找不到明确的简短答案。最后，有 1% 的文档有一段用“是”或“否”的简短答案注释的段落，而不是短跨度列表。 ## 类定义 null ## 引文 ``` @article{kwiatkowski2019natural, title={Natural questions: a benchmark for question answering research}, author={Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and others}, journal={Transactions of the Association for Computational Linguistics}, volume={7}, pages={453--466}, year={2019}, publisher={MIT Press} } ``` ## Download dataset :modelscope-code[]{type="git"}

displayName: 自然问题（Natural Questions） labelTypes: - 英文语料库（English Corpus） license: - CC BY-SA 3.0 mediaTypes: - 文本（Text） paperUrl: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/b8c26e4347adc3453c15d96a09e6f7f102293f71.pdf publishDate: 2019 publishUrl: https://ai.google.com/research/NaturalQuestions publisher: - 谷歌研究院（Google Research） tags: - 随笔（Essay） taskTypes: - 问题生成（Question Generation） - 开放域问答（Open-Domain Question Answering） - 视觉问答（Visual Question Answering） --- # 数据集介绍 ## 简介自然问题语料库（Natural Questions）是一款面向问答研究的基准数据集，共包含307,373条训练样本、7,830条开发样本与7,842条测试样本。每条样本均由一条谷歌搜索查询与对应的维基百科（Wikipedia）页面构成。对于每篇维基百科页面，数据集中会标注出可回答该查询的段落（即长答案），以及该标注段落中承载实际答案的一个或多个短文本片段。需注意，长答案与短答案的标注均可为空：若二者均为空，则说明该页面中不存在对应答案；若长答案标注非空但短答案标注为空，则代表该标注段落可回答问题，但未找到明确的简短答案片段；此外，约1%的文档会采用“是”或“否”作为简短答案，而非短文本片段列表。 ## 类定义无 ## 引文 @article{kwiatkowski2019natural, title={Natural questions: a benchmark for question answering research}, author={Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and others}, journal={Transactions of the Association for Computational Linguistics}, volume={7}, pages={453--466}, year={2019}, publisher={MIT Press} } ## 数据集下载 :modelscope-code[]{type="git"}

提供机构：

maas

创建时间：

2024-07-11

搜集汇总

数据集介绍