Cause-Effect-Context from Natural Questions (NQ-CE)

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/4000980

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset is derived from the Natural Questions (NQ) dataset which is a large benchmark for open question answering research (https://ai.google.com/research/NaturalQuestions). This dataset contains a collection of cause-effect pairs along with their context (the text describing the causal relation between the cause and the effect) as well as the original question in the NQ data set. It also contains a collection of "negative" pairs, phrases that are mentioned in the context but have no causal relation. This dataset is constructed by filtering questions in the NQ dataset that follow a certain pattern indicating that the question is causal. Either the cause or the effect is the (short) answer in the original NQ dataset, and the other side is manually derived from the context. The data is shared in JSONL format with every line being a processed NQ question with relevant fields described above. In versions 2 and 2.5, each JSON object has the following fields: phrase1: the first phrase (text span) phrase2: the second phrase (text span) label: "causal" means phrase 1 causes phrase 2, "non_causal" means "phrase1" and "phrase2" do NOT have a causal relation between them passage: the context that states that phrase1 causes phrase2 (for causal) or just the passage that has both phrase1 and phrase2 (for non_causal). document_url: the Wikipedia URL from the Natural Questions data question_text: the original question text from the Natural Questions data Versions 1.5 and 2.5 enforce exactness of causes and effects as seen verbatim in the context passage. This is different from version 1 and 2 because there we are interested in how causes and effects are annotated as per human understanding and manual curation, i.e. how a human being would parse and understand causes and effects in text and to enforce simple grammatical consistency in causes and effects. Eg. in a passage, say ... "Above - average sea water temperatures caused by global warming is the leading cause of coral bleaching." Version 1 and 2 could say something like, "above average sea water temperatures" while stating the cause but versions 1.5 and 2.5 will always say, "Above - average sea water temperatures", i.e. the cause and effect will be exactly as seen in text. These versions help in matching results of any causal relation extraction (CRE) algorithm and method better, e.g. obtaining results of CRE using some method X and comparing against what is present as causes and effects. Hence these versions (1.5, 2.5, 1.5b and 2.5b) are also called Evaluation versions. Versions 1.5b and 2.5b are the final and the most accurate versions which should actually be used, the rest are intermediary versions present for archival purposes. The 1.5b and 2.5b sets are together called the Final Evaluation versions. License: https://creativecommons.org/licenses/by-sa/3.0/ Contacts: Gaurav Dass: dassg2 AT rpi.edu Oktie Hassanazadeh

创建时间：

2021-06-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集