malmaud/onestop_qa

Name: malmaud/onestop_qa
Creator: malmaud
Published: 2024-01-18 11:11:17
License: 暂无描述

Hugging Face2024-01-18 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/malmaud/onestop_qa

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - expert-generated language_creators: - expert-generated language: - en license: - cc-by-sa-4.0 multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original - extended|onestop_english task_categories: - question-answering task_ids: - multiple-choice-qa paperswithcode_id: onestopqa pretty_name: OneStopQA language_bcp47: - en-US dataset_info: features: - name: title dtype: string - name: paragraph dtype: string - name: level dtype: class_label: names: '0': Adv '1': Int '2': Ele - name: question dtype: string - name: paragraph_index dtype: int32 - name: answers sequence: string length: 4 - name: a_span sequence: int32 - name: d_span sequence: int32 splits: - name: train num_bytes: 1423090 num_examples: 1458 download_size: 118173 dataset_size: 1423090 --- # Dataset Card for OneStopQA ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-instances) - [Data Splits](#data-instances) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [OneStopQA repository](https://github.com/berzak/onestop-qa) - **Repository:** [OneStopQA repository](https://github.com/berzak/onestop-qa) - **Paper:** [STARC: Structured Annotations for Reading Comprehension](https://arxiv.org/abs/2004.14797) - **Leaderboard:** [Needs More Information] - **Point of Contact:** [Needs More Information] ### Dataset Summary OneStopQA is a multiple choice reading comprehension dataset annotated according to the STARC (Structured Annotations for Reading Comprehension) scheme. The reading materials are Guardian articles taken from the [OneStopEnglish corpus](https://github.com/nishkalavallabhi/OneStopEnglishCorpus). Each article comes in three difficulty levels, Elementary, Intermediate and Advanced. Each paragraph is annotated with three multiple choice reading comprehension questions. The reading comprehension questions can be answered based on any of the three paragraph levels. ### Supported Tasks and Leaderboards [Needs More Information] ### Languages English (`en-US`). The original Guardian articles were manually converted from British to American English. ## Dataset Structure ### Data Instances An example of instance looks as follows. ```json { "title": "101-Year-Old Bottle Message", "paragraph": "Angela Erdmann never knew her grandfather. He died in 1946, six years before she was born. But, on Tuesday 8th April, 2014, she described the extraordinary moment when she received a message in a bottle, 101 years after he had lobbed it into the Baltic Sea. Thought to be the world’s oldest message in a bottle, it was presented to Erdmann by the museum that is now exhibiting it in Germany.", "paragraph_index": 1, "level": "Adv", "question": "How did Angela Erdmann find out about the bottle?", "answers": ["A museum told her that they had it", "She coincidentally saw it at the museum where it was held", "She found it in her basement on April 28th, 2014", "A friend told her about it"], "a_span": [56, 70], "d_span": [16, 34] } ``` Where, | Answer | Description | Textual Span | |--------|------------------------------------------------------------|-----------------| | a | Correct answer. | Critical Span | | b | Incorrect answer. A miscomprehension of the critical span. | Critical Span | | c | Incorrect answer. Refers to an additional span. | Distractor Span | | d | Incorrect answer. Has no textual support. | - | The order of the answers in the `answers` list corresponds to the order of the answers in the table. ### Data Fields - `title`: A `string` feature. The article title. - `paragraph`: A `string` feature. The paragraph from the article. - `paragraph_index`: An `int` feature. Corresponds to the paragraph index in the article. - `question`: A `string` feature. The given question. - `answers`: A list of `string` feature containing the four possible answers. - `a_span`: A list of start and end indices (inclusive) of the critical span. - `d_span`: A list of start and end indices (inclusive) of the distractor span. *Span indices are according to word positions after whitespace tokenization. **In the rare case where a span is spread over multiple sections, the span list will contain multiple instances of start and stop indices in the format: [start_1, stop_1, start_2, stop_2,...]. ### Data Splits Articles: 30 Paragraphs: 162 Questions: 486 Question-Paragraph Level pairs: 1,458 No preconfigured split is currently provided. ## Dataset Creation ### Curation Rationale [Needs More Information] ### Source Data #### Initial Data Collection and Normalization [Needs More Information] #### Who are the source language producers? [Needs More Information] ### Annotations #### Annotation process The annotation and piloting process of the dataset is described in Appendix A in [STARC: Structured Annotations for Reading Comprehension](https://aclanthology.org/2020.acl-main.507.pdf). #### Who are the annotators? [Needs More Information] ### Personal and Sensitive Information [Needs More Information] ## Considerations for Using the Data ### Social Impact of Dataset [Needs More Information] ### Discussion of Biases [Needs More Information] ### Other Known Limitations [Needs More Information] ## Additional Information ### Dataset Curators [Needs More Information] ### Licensing Information <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. ### Citation Information [STARC: Structured Annotations for Reading Comprehension](http://people.csail.mit.edu/berzak/papers/acl2020.pdf) ``` @inproceedings{starc2020, author = {Berzak, Yevgeni and Malmaud, Jonathan and Levy, Roger}, title = {STARC: Structured Annotations for Reading Comprehension}, booktitle = {ACL}, year = {2020}, publisher = {Association for Computational Linguistics} } ``` ### Contributions Thanks to [@scaperex](https://github.com/scaperex) for adding this dataset.

提供机构：

malmaud

原始信息汇总

数据集概述

数据集基本信息

名称: OneStopQA
语言: 英语 (en-US)
许可证: CC BY-SA 4.0
数据集大小: 1,423,090 字节
下载大小: 118,173 字节
数据实例数量: 1,458

数据集描述

OneStopQA 是一个多选阅读理解数据集，按照 STARC（结构化阅读理解注释）方案进行注释。阅读材料来自《卫报》文章，源自 OneStopEnglish 语料库。每篇文章分为三个难度级别：初级、中级和高级。每个段落都附有三个多选阅读理解问题。

数据结构

数据字段

title: 文章标题，字符串类型。
paragraph: 文章段落，字符串类型。
paragraph_index: 段落在文章中的索引，整数类型。
level: 段落难度级别，类别标签类型，包括 Adv（高级）、Int（中级）和 Ele（初级）。
question: 阅读理解问题，字符串类型。
answers: 四个可能答案的列表，字符串类型。
a_span: 正确答案的关键跨度，整数列表类型。
d_span: 干扰答案的跨度，整数列表类型。

数据分割

训练集: 1,458 个实例

数据集创建

注释过程

注释和试点过程在 STARC: Structured Annotations for Reading Comprehension 的附录 A 中有详细描述。

使用数据集的注意事项

许可证信息

该数据集遵循 Creative Commons Attribution-ShareAlike 4.0 International License。

引用信息

@inproceedings{starc2020,
author = {Berzak, Yevgeni and Malmaud, Jonathan and Levy, Roger},
title = {STARC: Structured Annotations for Reading Comprehension},
booktitle = {ACL},
year = {2020},
publisher = {Association for Computational Linguistics} }

搜集汇总

数据集介绍

构建方式

在自然语言处理领域，阅读理解任务对数据质量要求极高。OneStopQA数据集的构建依托于STARC（结构化阅读理解标注）框架，该框架为多选型阅读理解提供了系统化标注方案。数据源选自《卫报》文章，这些文章源自OneStopEnglish语料库，每篇文章均被划分为初级、中级和高级三个难度层级。专家标注者为每个段落精心设计了三道多选题，并依据STARC准则标注了关键跨度与干扰跨度，确保问题与文本深度关联。整个标注过程经过严格设计，旨在生成具有明确语言学结构的标注数据，为模型训练提供可靠基础。

特点

该数据集在阅读理解领域展现出独特价值，其核心特征在于结构化标注体系。每个数据实例不仅包含问题与答案选项，还精确标注了文本中的关键跨度与干扰跨度，这为分析模型推理过程提供了透明窗口。数据涵盖三个明确的语言难度等级，使得研究能够针对不同语言能力展开。此外，所有文章均从英式英语转换为美式英语，保证了语言风格的一致性。数据集规模适中，包含约1,458个问题-段落对，源于30篇文章的162个段落，为可控实验提供了丰富素材。

使用方法

对于研究者而言，该数据集主要用于多选型阅读理解模型的训练与评估。使用者可加载数据集并访问其结构化字段，包括段落文本、对应问题、四个候选答案以及标注的文本跨度。模型训练时可利用段落内容与问题生成答案预测，并通过对比标注答案评估性能。标注的跨度信息（a_span, d_span）特别适用于可解释性研究，例如分析模型是否依赖于正确的文本证据进行推理。由于数据集未预设训练与测试划分，使用者需自行设计合理的交叉验证或留出策略，以确保评估结果的稳健性。

背景与挑战

背景概述

OneStopQA数据集于2020年由麻省理工学院等研究机构的研究人员Yevgeni Berzak、Jonathan Malmaud和Roger Levy共同创建，其核心研究问题聚焦于阅读理解任务的精细化评估。该数据集基于STARC（结构化阅读理解标注）方案构建，旨在通过多难度层级的文本与多选问题，深入探究模型在不同语言复杂度下的推理能力。其源文本选自《卫报》文章，并经过专家手动调整为三个难度级别（初级、中级、高级），为自然语言处理领域提供了评估模型泛化性与鲁棒性的重要基准，推动了阅读理解研究向细粒度分析方向发展。

当前挑战

OneStopQA数据集主要应对阅读理解领域中模型对文本难度适应性评估的挑战，其多难度层级设计要求模型不仅能处理表层信息，还需理解复杂句法与语义结构。在构建过程中，专家需对同一文章进行难度分级与问题标注，确保问题与各难度段落间的逻辑一致性，同时精确标注关键跨度与干扰跨度，以区分模型对文本细节的捕捉能力与错误推理模式，这一过程对标注者的语言理解深度与一致性提出了较高要求。

常用场景

经典使用场景

在自然语言处理领域，阅读理解任务旨在评估模型对文本深层语义的理解能力。OneStopQA数据集以其精心设计的结构化标注方案，为多选型阅读理解研究提供了经典范例。该数据集基于《卫报》文章构建，每段文本均配有三个难度级别的改写版本，并附有对应的多项选择题，要求模型依据给定段落准确选择答案。这种设计使得研究者能够系统考察模型在不同文本复杂度下的推理表现，尤其适用于探究模型对关键信息提取与干扰项辨别的能力。

解决学术问题

该数据集通过引入STARC标注框架，有效解决了阅读理解研究中答案可解释性不足的难题。传统数据集往往仅提供答案标签，而OneStopQA不仅标注正确答案，还明确标识出支撑答案的关键文本片段及干扰性文本片段。这种细粒度标注使研究者能够深入分析模型决策依据，区分其是基于语义理解还是表面模式匹配。该数据集推动了可解释阅读理解模型的发展，为评估模型鲁棒性及抗干扰能力提供了标准化测试平台。

衍生相关工作

基于OneStopQA数据集的结构化标注特性，学术界衍生出多项经典研究工作。STARC原始论文系统阐述了标注框架的理论基础与应用价值，为后续细粒度阅读理解研究奠定方法论基础。部分研究聚焦于跨难度级别迁移学习，探索模型从简单文本到复杂文本的知识泛化能力。另有工作利用其关键片段与干扰片段标注，开发注意力机制可视化工具，深化了对神经网络决策过程的理解。这些工作共同推动了可解释人工智能在自然语言处理领域的发展。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集