deepmind/narrativeqa

Name: deepmind/narrativeqa
Creator: deepmind
Published: 2024-03-06 07:33:05
License: 暂无描述

Hugging Face2024-03-06 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/deepmind/narrativeqa

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - crowdsourced language_creators: - found language: - en license: - apache-2.0 multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - original task_categories: - text2text-generation task_ids: - abstractive-qa paperswithcode_id: narrativeqa pretty_name: NarrativeQA dataset_info: features: - name: document struct: - name: id dtype: string - name: kind dtype: string - name: url dtype: string - name: file_size dtype: int32 - name: word_count dtype: int32 - name: start dtype: string - name: end dtype: string - name: summary struct: - name: text dtype: string - name: tokens sequence: string - name: url dtype: string - name: title dtype: string - name: text dtype: string - name: question struct: - name: text dtype: string - name: tokens sequence: string - name: answers list: - name: text dtype: string - name: tokens sequence: string splits: - name: train num_bytes: 11556607782 num_examples: 32747 - name: test num_bytes: 3547135501 num_examples: 10557 - name: validation num_bytes: 1211859418 num_examples: 3461 download_size: 3232805701 dataset_size: 16315602701 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* - split: validation path: data/validation-* --- # Dataset Card for Narrative QA ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Repository:** https://github.com/deepmind/narrativeqa - **Paper:** https://arxiv.org/abs/1712.07040 - **Paper:** https://aclanthology.org/Q18-1023/ - **Point of Contact:** [Tomáš Kočiský](mailto:tkocisky@google.com) [Jonathan Schwarz](mailto:schwarzjn@google.com) [Phil Blunsom](pblunsom@google.com) [Chris Dyer](cdyer@google.com) [Karl Moritz Hermann](mailto:kmh@google.com) [Gábor Melis](mailto:melisgl@google.com) [Edward Grefenstette](mailto:etg@google.com) ### Dataset Summary NarrativeQA is an English-lanaguage dataset of stories and corresponding questions designed to test reading comprehension, especially on long documents. ### Supported Tasks and Leaderboards The dataset is used to test reading comprehension. There are 2 tasks proposed in the paper: "summaries only" and "stories only", depending on whether the human-generated summary or the full story text is used to answer the question. ### Languages English ## Dataset Structure ### Data Instances A typical data point consists of a question and answer pair along with a summary/story which can be used to answer the question. Additional information such as the url, word count, wikipedia page, are also provided. A typical example looks like this: ``` { "document": { "id": "23jncj2n3534563110", "kind": "movie", "url": "https://www.imsdb.com/Movie%20Scripts/Name%20of%20Movie.html", "file_size": 80473, "word_count": 41000, "start": "MOVIE screenplay by", "end": ". THE END", "summary": { "text": "Joe Bloggs begins his journey exploring...", "tokens": ["Joe", "Bloggs", "begins", "his", "journey", "exploring",...], "url": "http://en.wikipedia.org/wiki/Name_of_Movie", "title": "Name of Movie (film)" }, "text": "MOVIE screenplay by John Doe\nSCENE 1..." }, "question": { "text": "Where does Joe Bloggs live?", "tokens": ["Where", "does", "Joe", "Bloggs", "live", "?"], }, "answers": [ {"text": "At home", "tokens": ["At", "home"]}, {"text": "His house", "tokens": ["His", "house"]} ] } ``` ### Data Fields - `document.id` - Unique ID for the story. - `document.kind` - "movie" or "gutenberg" depending on the source of the story. - `document.url` - The URL where the story was downloaded from. - `document.file_size` - File size (in bytes) of the story. - `document.word_count` - Number of tokens in the story. - `document.start` - First 3 tokens of the story. Used for verifying the story hasn't been modified. - `document.end` - Last 3 tokens of the story. Used for verifying the story hasn't been modified. - `document.summary.text` - Text of the wikipedia summary of the story. - `document.summary.tokens` - Tokenized version of `document.summary.text`. - `document.summary.url` - Wikipedia URL of the summary. - `document.summary.title` - Wikipedia Title of the summary. - `question` - `{"text":"...", "tokens":[...]}` for the question about the story. - `answers` - List of `{"text":"...", "tokens":[...]}` for valid answers for the question. ### Data Splits The data is split into training, valiudation, and test sets based on story (i.e. the same story cannot appear in more than one split): | Train | Valid | Test | | ------ | ----- | ----- | | 32747 | 3461 | 10557 | ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization Stories and movies scripts were downloaded from [Project Gutenburg](https://www.gutenberg.org) and a range of movie script repositories (mainly [imsdb](http://www.imsdb.com)). #### Who are the source language producers? The language producers are authors of the stories and scripts as well as Amazon Turk workers for the questions. ### Annotations #### Annotation process Amazon Turk Workers were provided with human written summaries of the stories (To make the annotation tractable and to lead annotators towards asking non-localized questions). Stories were matched with plot summaries from Wikipedia using titles and verified the matching with help from human annotators. The annotators were asked to determine if both the story and the summary refer to a movie or a book (as some books are made into movies), or if they are the same part in a series produced in the same year. Annotators on Amazon Mechanical Turk were instructed to write 10 question–answer pairs each based solely on a given summary. Annotators were instructed to imagine that they are writing questions to test students who have read the full stories but not the summaries. We required questions that are specific enough, given the length and complexity of the narratives, and to provide adiverse set of questions about characters, events, why this happened, and so on. Annotators were encouraged to use their own words and we prevented them from copying. We asked for answers that are grammatical, complete sentences, and explicitly allowed short answers (one word, or a few-word phrase, or ashort sentence) as we think that answering with a full sentence is frequently perceived as artificial when asking about factual information. Annotators were asked to avoid extra, unnecessary information in the question or the answer, and to avoid yes/no questions or questions about the author or the actors. #### Who are the annotators? Amazon Mechanical Turk workers. ### Personal and Sensitive Information None ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information The dataset is released under a [Apache-2.0 License](https://github.com/deepmind/narrativeqa/blob/master/LICENSE). ### Citation Information ``` @article{kocisky-etal-2018-narrativeqa, title = "The {N}arrative{QA} Reading Comprehension Challenge", author = "Ko{\v{c}}isk{\'y}, Tom{\'a}{\v{s}} and Schwarz, Jonathan and Blunsom, Phil and Dyer, Chris and Hermann, Karl Moritz and Melis, G{\'a}bor and Grefenstette, Edward", editor = "Lee, Lillian and Johnson, Mark and Toutanova, Kristina and Roark, Brian", journal = "Transactions of the Association for Computational Linguistics", volume = "6", year = "2018", address = "Cambridge, MA", publisher = "MIT Press", url = "https://aclanthology.org/Q18-1023", doi = "10.1162/tacl_a_00023", pages = "317--328", abstract = "Reading comprehension (RC){---}in contrast to information retrieval{---}requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecting answers using superficial information (e.g., local context similarity or global term frequency); they thus fail to test for the essential integrative aspect of RC. To encourage progress on deeper comprehension of language, we present a new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts. These tasks are designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience. We show that although humans solve the tasks easily, standard RC models struggle on the tasks presented here. We provide an analysis of the dataset and the challenges it presents.", } ``` ### Contributions Thanks to [@ghomasHudson](https://github.com/ghomasHudson) for adding this dataset.

提供机构：

deepmind

原始信息汇总

数据集卡片 for NarrativeQA

数据集描述

数据集摘要

NarrativeQA 是一个英语数据集，包含故事和相应的问题，旨在测试阅读理解能力，尤其是在长文档上的理解。

支持的任务和排行榜

该数据集用于测试阅读理解。论文中提出了两个任务：“仅摘要”和“仅故事”，取决于是否使用人类生成的摘要或完整的故事文本来回答问题。

语言

英语

数据集结构

数据实例

一个典型的数据点包括一个问题和答案对，以及一个可用于回答问题的摘要/故事。还提供了诸如URL、字数、维基百科页面等附加信息。

数据字段

document.id - 故事的唯一ID。
document.kind - 故事的来源，如“电影”或“古腾堡”。
document.url - 故事下载的URL。
document.file_size - 故事文件的大小（以字节为单位）。
document.word_count - 故事中的词数。
document.start - 故事的前3个词，用于验证故事是否被修改。
document.end - 故事的最后3个词，用于验证故事是否被修改。
document.summary.text - 故事的维基百科摘要文本。
document.summary.tokens - document.summary.text的标记化版本。
document.summary.url - 摘要的维基百科URL。
document.summary.title - 摘要的维基百科标题。
question - 关于故事的问题，格式为{"text":"...", "tokens":[...]}。
answers - 问题的有效答案列表，每个答案格式为{"text":"...", "tokens":[...]}。

数据分割

数据分为训练、验证和测试集，基于故事（即同一个故事不能出现在多个分割中）：

训练集	验证集	测试集
32747	3461	10557

数据集创建

数据收集和规范化

故事和电影剧本从Project Gutenburg和一系列电影剧本仓库（主要是imsdb）下载。

语言生产者

语言生产者是故事和剧本的作者以及亚马逊土耳其工人（用于问题）。

标注过程

亚马逊土耳其工人被提供人类编写的故事摘要（以使标注可处理并引导标注者提出非本地化问题）。故事与维基百科的情节摘要通过标题匹配，并由人工标注者验证匹配。标注者被要求确定故事和摘要是否指的是电影或书籍（因为有些书籍被改编成电影），或者它们是否是同一年制作的系列中的相同部分。亚马逊机械土耳其工人被指示仅基于给定的摘要编写10个问题-答案对。标注者被指示想象他们正在编写问题来测试已经阅读完整故事但未阅读摘要的学生。我们要求问题足够具体，考虑到叙事的复杂性和长度，并提供关于角色、事件、为什么会发生等多样性的问题。标注者被鼓励使用自己的词汇，我们防止他们复制。我们要求答案是语法正确的完整句子，并明确允许简短答案（一个词，或几个词的短语，或简短的句子），因为我们认为用完整句子回答事实信息经常被认为是人工的。标注者被要求避免在问题或答案中提供额外的不必要信息，并避免是/否问题或关于作者或演员的问题。

标注者

亚马逊机械土耳其工人。

使用数据的注意事项

数据集的社会影响

[更多信息需要]

偏见的讨论

[更多信息需要]

其他已知限制

[更多信息需要]

附加信息

数据集许可信息

该数据集在Apache-2.0 License下发布。

引用信息

@article{kocisky-etal-2018-narrativeqa, title = "The {N}arrative{QA} Reading Comprehension Challenge", author = "Ko{v{c}}isk{y}, Tom{a}{v{s}} and Schwarz, Jonathan and Blunsom, Phil and Dyer, Chris and Hermann, Karl Moritz and Melis, G{a}bor and Grefenstette, Edward", editor = "Lee, Lillian and Johnson, Mark and Toutanova, Kristina and Roark, Brian", journal = "Transactions of the Association for Computational Linguistics", volume = "6", year = "2018", address = "Cambridge, MA", publisher = "MIT Press", url = "https://aclanthology.org/Q18-1023", doi = "10.1162/tacl_a_00023", pages = "317--328", abstract = "Reading comprehension (RC){---}in contrast to information retrieval{---}requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecting answers using superficial information (e.g., local context similarity or global term frequency); they thus fail to test for the essential integrative aspect of RC. To encourage progress on deeper comprehension of language, we present a new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts. These tasks are designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience. We show that although humans solve the tasks easily, standard RC models struggle on the tasks presented here. We provide an analysis of the dataset and the challenges it presents.", }

贡献

感谢@ghomasHudson添加此数据集。

搜集汇总

数据集介绍

构建方式

NarrativeQA数据集的构建过程体现了对长篇文档阅读理解能力的深度测试需求。该数据集通过从Project Gutenberg和多个电影剧本库中收集故事和电影剧本，结合亚马逊土耳其机器人（Amazon Mechanical Turk）众包平台上的工人编写的问题和答案对，形成了丰富的数据资源。每个数据点包括一个故事或剧本的摘要、相关的问题以及多个可能的答案，确保了数据的多样性和复杂性。

特点

NarrativeQA数据集的特点在于其专注于长篇文档的阅读理解，涵盖了从文学作品到电影剧本的广泛内容。每个数据点不仅包含详细的文档信息，如文档类型、来源URL、文件大小和词数，还提供了由人类编写的摘要和问题，以及多个可能的答案。这种结构使得数据集能够有效测试模型在理解复杂叙事和推理方面的能力。

使用方法

NarrativeQA数据集的使用方法主要围绕其设计的两个任务展开：仅使用摘要回答问题或仅使用完整故事文本回答问题。研究人员可以通过训练集、验证集和测试集来评估和优化模型的阅读理解能力。数据集的结构化信息，如文档的起始和结束标记、摘要的文本和URL，以及问题的文本和答案列表，为模型的训练和评估提供了丰富的上下文信息。

背景与挑战

背景概述

NarrativeQA数据集由DeepMind团队于2018年推出，旨在推动阅读理解领域的研究，特别是在长文档理解方面。该数据集包含大量故事及其对应的问题，这些问题设计用于测试模型对复杂叙事的理解能力。数据集的核心研究问题在于如何通过阅读整个书籍或电影剧本来回答问题，而不仅仅是依赖局部上下文或表面信息。这一研究问题对自然语言处理领域产生了深远影响，尤其是在抽象问答和长文本理解方面。

当前挑战

NarrativeQA数据集面临的挑战主要体现在两个方面。首先，该数据集旨在解决长文档阅读理解的问题，这要求模型能够整合信息并进行跨文档的推理，而不仅仅是依赖局部上下文或表面信息。这一挑战使得传统的阅读理解模型在该数据集上表现不佳。其次，在数据集的构建过程中，研究人员需要通过众包平台（如Amazon Mechanical Turk）生成问题和答案，这要求确保问题的多样性和复杂性，同时避免生成过于简单或重复的问题。此外，数据集的构建还需要确保故事与摘要的匹配准确性，以避免引入噪声或偏差。

常用场景

经典使用场景

NarrativeQA数据集在自然语言处理领域中被广泛用于测试和评估机器阅读理解模型的性能。该数据集通过提供长篇故事和相应的问题，要求模型在理解整个叙事的基础上回答问题，从而挑战模型在长文本中的信息整合和推理能力。

解决学术问题

NarrativeQA数据集解决了传统阅读理解任务中模型过度依赖浅层信息匹配的问题。通过设计需要深入理解叙事内容的问题，该数据集推动了机器阅读理解模型在信息整合和推理能力上的进步，为研究长文本理解提供了重要的实验平台。

衍生相关工作

NarrativeQA数据集激发了大量相关研究，尤其是在长文本理解和抽象问答领域。基于该数据集，研究者们提出了多种改进的模型架构和训练方法，如基于注意力机制的神经网络和预训练语言模型，这些工作显著提升了机器在复杂叙事中的表现。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集