ropes

Name: ropes
Creator: maas
Published: 2025-07-11 16:29:36
License: 暂无描述

魔搭社区2025-07-11 更新2025-05-31 收录

下载链接：

https://modelscope.cn/datasets/allenai/ropes

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for ROPES ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [ROPES dataset](https://allenai.org/data/ropes) - **Paper:** [Reasoning Over Paragraph Effects in Situations](https://arxiv.org/abs/1908.05852) - **Leaderboard:** [ROPES leaderboard](https://leaderboard.allenai.org/ropes) ### Dataset Summary ROPES (Reasoning Over Paragraph Effects in Situations) is a QA dataset which tests a system's ability to apply knowledge from a passage of text to a new situation. A system is presented a background passage containing a causal or qualitative relation(s) (e.g., "animal pollinators increase efficiency of fertilization in flowers"), a novel situation that uses this background, and questions that require reasoning about effects of the relationships in the background passage in the context of the situation. ### Supported Tasks and Leaderboards The reading comprehension task is framed as an extractive question answering problem. Models are evaluated by computing word-level F1 and exact match (EM) metrics, following common practice for recent reading comprehension datasets (e.g., SQuAD). ### Languages The text in the dataset is in English. The associated BCP-47 code is `en`. ## Dataset Structure ### Data Instances Data closely follow the SQuAD v1.1 format. An example looks like this: ``` { "id": "2058517998", "background": "Cancer is a disease that causes cells to divide out of control. Normally, the body has systems that prevent cells from dividing out of control. But in the case of cancer, these systems fail. Cancer is usually caused by mutations. Mutations are random errors in genes. Mutations that lead to cancer usually happen to genes that control the cell cycle. Because of the mutations, abnormal cells divide uncontrollably. This often leads to the development of a tumor. A tumor is a mass of abnormal tissue. As a tumor grows, it may harm normal tissues around it. Anything that can cause cancer is called a carcinogen . Carcinogens may be pathogens, chemicals, or radiation.", "situation": "Jason recently learned that he has cancer. After hearing this news, he convinced his wife, Charlotte, to get checked out. After running several tests, the doctors determined Charlotte has no cancer, but she does have high blood pressure. Relieved at this news, Jason was now focused on battling his cancer and fighting as hard as he could to survive.", "question": "Whose cells are dividing more rapidly?", "answers": { "text": ["Jason"] }, } ``` ### Data Fields - `id`: identification - `background`: background passage - `situation`: the grounding situation - `question`: the question to answer - `answers`: the answer text which is a span from either the situation or the question. The text list always contain a single element. Note that the answers for the test set are hidden (and thus represented as an empty list). Predictions for the test set should be submitted to the leaderboard. ### Data Splits The dataset contains 14k QA pairs over 1.7K paragraphs, split between train (10k QAs), development (1.6k QAs) and a hidden test partition (1.7k QAs). ## Dataset Creation ### Curation Rationale From the original paper: *ROPES challenges reading comprehension models to handle more difficult phenomena: understanding the implications of a passage of text. ROPES is also particularly related to datasets focusing on "multi-hop reasoning", as by construction answering questions in ROPES requires connecting information from multiple parts of a given passage.* *We constructed ROPES by first collecting background passages from science textbooks and Wikipedia articles that describe causal relationships. We showed the collected paragraphs to crowd workers and asked them to write situations that involve the relationships found in the background passage, and questions that connect the situation and the background using the causal relationships. The answers are spans from either the situation or the question. The dataset consists of 14,322 questions from various domains, mostly in science and economics.* ### Source Data From the original paper: *We automatically scraped passages from science textbooks and Wikipedia that contained causal connectives eg. ”causes,” ”leads to,” and keywords that signal qualitative relations, e.g. ”increases,” ”decreases.”. We then manually filtered out the passages that do not have at least one relation. The passages can be categorized into physical science (49%), life science (45%), economics (5%) and other (1%). In total, we collected over 1,000 background passages.* #### Initial Data Collection and Normalization From the original paper: *We used Amazon Mechanical Turk (AMT) to generate the situations, questions, and answers. The AMT workers were given background passages and asked to write situations that involved the relation(s) in the background passage. The AMT workers then authored questions about the situation that required both the background and the situation to answer. In each human intelligence task (HIT), AMT workers are given 5 background passages to select from and are asked to create a total of 10 questions. To mitigate the potential for easy lexical shortcuts in the dataset, the workers were encouraged via instructions to write questions in minimal pairs, where a very small change in the question results in a different answer.* *Most questions are designed to have two sensible answer choices (eg. “more” vs. “less”).* To reduce annotator bias, training and evaluation sets are writter by different annotators. #### Who are the source language producers? [More Information Needed] ### Annotations [More Information Needed] #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information The data is distributed under the [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license. ### Citation Information ``` @inproceedings{Lin2019ReasoningOP, title={Reasoning Over Paragraph Effects in Situations}, author={Kevin Lin and Oyvind Tafjord and Peter Clark and Matt Gardner}, booktitle={MRQA@EMNLP}, year={2019} } ``` ### Contributions Thanks to [@VictorSanh](https://github.com/VictorSanh) for adding this dataset.

# ROPES数据集卡片 ## 目录 - [数据集概述](#dataset-description) - [数据集摘要](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [编纂依据](#curation-rationale) - [源数据](#source-data) - [标注](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集编纂者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献](#contributions) ## 数据集概述 - **主页：** [ROPES数据集](https://allenai.org/data/ropes) - **相关论文：** [场景下段落效应推理（Reasoning Over Paragraph Effects in Situations）](https://arxiv.org/abs/1908.05852) - **排行榜：** [ROPES排行榜](https://leaderboard.allenai.org/ropes) ### 数据集摘要 ROPES（场景下段落效应推理，Reasoning Over Paragraph Effects in Situations）是一个问答（QA）数据集，用于测试系统将文本段落中的知识应用到新场景的能力。该任务向系统提供包含因果或定性关系的背景段落（例如“动物传粉者提高花朵的授粉效率”）、使用该背景的新颖场景，以及需要结合背景段落中的关系在场景语境下推理其效应的问题。 ### 支持任务与排行榜该阅读理解任务被建模为抽取式问答问题。模型评估采用词级F1值与精确匹配（Exact Match, EM）指标，遵循近期阅读理解数据集（如SQuAD）的通用实践标准。 ### 语言数据集文本为英语，关联BCP-47代码为`en`。 ## 数据集结构 ### 数据实例数据格式严格遵循SQuAD v1.1规范。示例如下： { "id": "2058517998", "background": "癌症是一种导致细胞不受控分裂的疾病。正常情况下，人体拥有阻止细胞不受控分裂的机制。但在癌症病例中，这些机制失效。癌症通常由突变引发。突变是基因中的随机错误。引发癌症的突变通常发生在调控细胞周期的基因中。由于这些突变，异常细胞不受控地分裂。这通常会导致肿瘤的形成。肿瘤是一团异常组织。随着肿瘤生长，它可能损害周围的正常组织。任何可引发癌症的物质都被称为致癌物。致癌物可能是病原体、化学物质或辐射。", "situation": "杰森（Jason）近期确诊癌症。得知消息后，他说服妻子夏洛特（Charlotte）进行检查。经过多项检测后，医生确认夏洛特未患癌症，但她患有高血压。得知这一结果后，杰森松了一口气，现在他专注于对抗癌症，尽最大努力活下去。", "question": "谁的细胞分裂速度更快？", "answers": { "text": ["杰森"] } } ### 数据字段 - `id`：标识符 - `background`：背景段落 - `situation`：锚定场景 - `question`：待回答问题 - `answers`：答案文本，取自场景或问题的片段，文本列表始终仅包含一个元素。注意：测试集的答案被隐藏（表现为空列表），测试集预测结果需提交至排行榜。 ### 数据划分该数据集包含1.7k个背景段落对应的14k个问答对，划分为训练集（10k个问答对）、开发集（1.6k个问答对）与隐藏测试集（1.7k个问答对）。 ## 数据集构建 ### 编纂依据源自原论文： > ROPES挑战阅读理解模型处理更复杂的语言现象：理解文本段落的隐含意义。ROPES还与聚焦“多跳推理”的数据集高度相关，因为从构建逻辑来看，回答ROPES中的问题需要关联给定段落多个部分的信息。 > > 我们构建ROPES的流程为：首先从科学教科书和维基百科中收集描述因果关系的背景段落。随后将收集到的段落展示给众包工作者，要求他们编写涉及背景段落中关系的场景，以及利用因果关系连接场景与背景的问题。答案取自场景或问题的片段。该数据集包含来自多个领域的14322个问题，主要涵盖科学与经济学领域。 ### 源数据源自原论文： > 我们自动从科学教科书和维基百科爬取包含因果连接词（如“导致”“引发”）和表征定性关系的关键词（如“增加”“减少”）的段落。随后手动过滤掉不包含至少一种关系的段落。这些段落可分为物理科学（49%）、生命科学（45%）、经济学（5%）及其他（1%）。总计收集了超过1000个背景段落。 #### 初始数据收集与标准化源自原论文： > 我们使用亚马逊机械 Turk（Amazon Mechanical Turk, AMT）生成场景、问题与答案。向AMT工作者提供背景段落，要求他们编写涉及背景段落中关系的场景，随后编写需要结合背景与场景才能回答的、关于该场景的问题。在每个人工智能任务（Human Intelligence Task, HIT）中，AMT工作者可从5个背景段落中选择，并要求创建共10个问题。为降低数据集出现简单词汇捷径的潜在可能，我们通过指令鼓励工作者以最小对组（minimal pairs）形式编写问题，即仅对问题做极小改动即可得到不同答案。 > > 大多数问题被设计为包含两个合理的可选答案（如“更多”与“更少”）。为减少标注者偏差，训练集与评估集由不同的标注者编写。 #### 源语言生产者信息【更多信息待补充】 ### 标注【更多信息待补充】 #### 标注流程【更多信息待补充】 #### 标注者信息【更多信息待补充】 ### 个人与敏感信息【更多信息待补充】 ## 数据集使用注意事项 ### 数据集的社会影响【更多信息待补充】 ### 偏差讨论【更多信息待补充】 ### 其他已知局限性【更多信息待补充】 ## 附加信息 ### 数据集编纂者【更多信息待补充】 ### 许可信息本数据集采用[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)许可协议分发。 ### 引用信息 @inproceedings{Lin2019ReasoningOP, title={Reasoning Over Paragraph Effects in Situations}, author={Kevin Lin and Oyvind Tafjord and Peter Clark and Matt Gardner}, booktitle={MRQA@EMNLP}, year={2019} } ### 贡献感谢[@VictorSanh](https://github.com/VictorSanh) 贡献本数据集。

提供机构：

maas

创建时间：

2025-05-28

搜集汇总

数据集介绍