imppres
收藏魔搭社区2025-11-27 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/facebook/imppres
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for IMPPRES
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [Github](https://github.com/facebookresearch/Imppres)
- **Repository:** [Github](https://github.com/facebookresearch/Imppres)
- **Paper:** [Aclweb](https://www.aclweb.org/anthology/2020.acl-main.768)
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
Over >25k semiautomatically generated sentence pairs illustrating well-studied pragmatic inference types. IMPPRES is an NLI dataset following the format of SNLI (Bowman et al., 2015), MultiNLI (Williams et al., 2018) and XNLI (Conneau et al., 2018), which was created to evaluate how well trained NLI models recognize several classes of presuppositions and scalar implicatures.
### Supported Tasks and Leaderboards
Natural Language Inference.
### Languages
English.
## Dataset Structure
### Data Instances
The data consists of 2 configurations: implicature and presupposition.
Each configuration consists of several different sub-datasets:
**Pressupposition**
- all_n_presupposition
- change_of_state
- cleft_uniqueness
- possessed_definites_existence
- question_presupposition
- both_presupposition
- cleft_existence
- only_presupposition
- possessed_definites_uniqueness
**Implicature**
- connectives
- gradable_adjective
- gradable_verb
- modals
- numerals_10_100
- numerals_2_3
- quantifiers
Each sentence type in IMPPRES is generated according to a template that specifies the linear order of the constituents in the sentence. The constituents are sampled from a vocabulary of over 3000 lexical items annotated with grammatical features needed to ensure wellformedness. We semiautomatically generate IMPPRES using a codebase developed by Warstadt et al. (2019a) and significantly expanded for the BLiMP dataset (Warstadt et al., 2019b).
Here is an instance of the raw presupposition data from any sub-dataset:
```buildoutcfg
{
"sentence1": "All ten guys that proved to boast might have been divorcing.",
"sentence2": "There are exactly ten guys that proved to boast.",
"trigger": "modal",
"presupposition": "positive",
"gold_label": "entailment",
"UID": "all_n_presupposition",
"pairID": "9e",
"paradigmID": 0
}
```
and the raw implicature data from any sub-dataset:
```buildoutcfg
{
"sentence1": "That teenager couldn't yell.",
"sentence2": "That teenager could yell.",
"gold_label_log": "contradiction",
"gold_label_prag": "contradiction",
"spec_relation": "negation",
"item_type": "control",
"trigger": "modal",
"lexemes": "can - have to"
}
```
### Data Fields
**Presupposition**
There is a slight mapping from the raw data fields in the presupposition sub-datasets and the fields appearing in the HuggingFace Datasets.
When dealing with the HF Dataset, the following mapping of fields happens:
```buildoutcfg
"premise" -> "sentence1"
"hypothesis"-> "sentence2"
"trigger" -> "trigger" or "Not_In_Example"
"trigger1" -> "trigger1" or "Not_In_Example"
"trigger2" -> "trigger2" or "Not_In_Example"
"presupposition" -> "presupposition" or "Not_In_Example"
"gold_label" -> "gold_label"
"UID" -> "UID"
"pairID" -> "pairID"
"paradigmID" -> "paradigmID"
```
For the most part, the majority of the raw fields remain unchanged. However, when it comes to the various `trigger` fields, a new mapping was introduced.
There are some examples in the dataset that only have the `trigger` field while other examples have the `trigger1` and `trigger2` field without the `trigger` or `presupposition` field.
Nominally, most examples look like the example in the Data Instances section above. Occassionally, however, some examples will look like:
```buildoutcfg
{
'sentence1': 'Did that committee know when Lissa walked through the cafe?',
'sentence2': 'That committee knew when Lissa walked through the cafe.',
'trigger1': 'interrogative',
'trigger2': 'unembedded',
'gold_label': 'neutral',
'control_item': True,
'UID': 'question_presupposition',
'pairID': '1821n',
'paradigmID': 95
}
```
In this example, `trigger1` and `trigger2` appear and `presupposition` and `trigger` are removed. This maintains the length of the dictionary.
To account for these examples, we have thus introduced the mapping above such that all examples accessed through the HF Datasets interface will have the same size as well as the same fields.
In the event that an example does not have a value for one of the fields, the field is maintained in the dictionary but given a value of `Not_In_Example`.
To illustrate this point, the example given in the Data Instances section above would look like the following in the HF Datasets:
```buildoutcfg
{
"premise": "All ten guys that proved to boast might have been divorcing.",
"hypothesis": "There are exactly ten guys that proved to boast.",
"trigger": "modal",
"trigger1": "Not_In_Example",
"trigger2": "Not_In_Example"
"presupposition": "positive",
"gold_label": "entailment",
"UID": "all_n_presupposition",
"pairID": "9e",
"paradigmID": 0
}
```
Below is description of the fields:
```buildoutcfg
"premise": The premise.
"hypothesis": The hypothesis.
"trigger": A detailed discussion of trigger types appears in the paper.
"trigger1": A detailed discussion of trigger types appears in the paper.
"trigger2": A detailed discussion of trigger types appears in the paper.
"presupposition": positive or negative.
"gold_label": Corresponds to entailment, contradiction, or neutral.
"UID": Unique id.
"pairID": Sentence pair ID.
"paradigmID": ?
```
It is not immediately clear what the difference is between `trigger`, `trigger1`, and `trigger2` is or what the `paradigmID` refers to.
**Implicature**
The `implicature` fields only have the mapping below:
```buildoutcfg
"premise" -> "sentence1"
"hypothesis"-> "sentence2"
```
Here is a description of the fields:
```buildoutcfg
"premise": The premise.
"hypothesis": The hypothesis.
"gold_label_log": Gold label for a logical reading of the sentence pair.
"gold_label_prag": Gold label for a pragmatic reading of the sentence pair.
"spec_relation": ?
"item_type": ?
"trigger": A detailed discussion of trigger types appears in the paper.
"lexemes": ?
```
### Data Splits
As the dataset was created to test already trained models, the only split that exists is for testing.
## Dataset Creation
### Curation Rationale
IMPPRES was created to evaluate how well trained NLI models recognize several classes of presuppositions and scalar implicatures.
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
The annotations were generated semi-automatically.
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
IMPPRES is available under a Creative Commons Attribution-NonCommercial 4.0 International Public License ("The License"). You may not use these files except in compliance with the License. Please see the LICENSE file for more information before you use the dataset.
### Citation Information
```buildoutcfg
@inproceedings{jeretic-etal-2020-natural,
title = "Are Natural Language Inference Models {IMPPRESsive}? {L}earning {IMPlicature} and {PRESupposition}",
author = "Jereti\v{c}, Paloma and
Warstadt, Alex and
Bhooshan, Suvrat and
Williams, Adina",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.768",
doi = "10.18653/v1/2020.acl-main.768",
pages = "8690--8705",
abstract = "Natural language inference (NLI) is an increasingly important task for natural language understanding, which requires one to infer whether a sentence entails another. However, the ability of NLI models to make pragmatic inferences remains understudied. We create an IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of 32K semi-automatically generated sentence pairs illustrating well-studied pragmatic inference types. We use IMPPRES to evaluate whether BERT, InferSent, and BOW NLI models trained on MultiNLI (Williams et al., 2018) learn to make pragmatic inferences. Although MultiNLI appears to contain very few pairs illustrating these inference types, we find that BERT learns to draw pragmatic inferences. It reliably treats scalar implicatures triggered by {``}some{''} as entailments. For some presupposition triggers like {``}only{''}, BERT reliably recognizes the presupposition as an entailment, even when the trigger is embedded under an entailment canceling operator like negation. BOW and InferSent show weaker evidence of pragmatic reasoning. We conclude that NLI training encourages models to learn some, but not all, pragmatic inferences.",
}
```
### Contributions
Thanks to [@aclifton314](https://github.com/aclifton314) for adding this dataset.
# IMPPRES 数据集卡片
## 目录
- [数据集概述](#dataset-description)
- [数据集总结](#dataset-summary)
- [支持任务与排行榜](#supported-tasks-and-leaderboards)
- [语言](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据划分](#data-splits)
- [数据集构建](#dataset-creation)
- [构建初衷](#curation-rationale)
- [源数据](#source-data)
- [标注](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据集使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差讨论](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集构建者](#dataset-curators)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献](#contributions)
## 数据集概述
- **"主页:"** [Github](https://github.com/facebookresearch/Imppres)
- **"代码仓库:"** [Github](https://github.com/facebookresearch/Imppres)
- **"论文:"** [Aclweb](https://www.aclweb.org/anthology/2020.acl-main.768)
- **"排行榜:"**
- **"联系人:"**
### 数据集总结
本数据集包含超2.5万条半自动化生成的语句对,涵盖学界已被充分研究的多种语用推理类型。IMPPRES是遵循SNLI(Bowman等人,2015)、MultiNLI(Williams等人,2018)与XNLI(Conneau等人,2018)格式的自然语言推理(Natural Language Inference, NLI)数据集,旨在评估已训练的NLI模型对多类预设(presupposition)和量级会话含义(scalar implicature)的识别能力。
### 支持任务与排行榜
自然语言推理。
### 语言
英语。
## 数据集结构
### 数据实例
本数据包含两个配置项:会话含义(implicature)与预设(presupposition)。每个配置项下包含多个不同的子数据集:
**预设(presupposition)**
- all_n_presupposition 全量预设
- change_of_state 状态变化
- cleft_uniqueness 分裂句唯一性
- possessed_definites_existence 领属定指成分存在性
- question_presupposition 疑问预设
- both_presupposition 双项预设
- cleft_existence 分裂句存在性
- only_presupposition 仅式预设
- possessed_definites_uniqueness 领属定指成分唯一性
**会话含义(implicature)**
- connectives 连接词
- gradable_adjective 等级形容词
- gradable_verb 等级动词
- modals 情态动词
- numerals_10_100 10至100的数词
- numerals_2_3 2至3的数词
- quantifiers 量词
IMPPRES中的每一类语句均按照指定语句成分线性顺序的模板生成,成分从包含超3000个带语法特征标注的词汇项的词表中采样,以确保语句合规。本数据集依托Warstadt等人(2019a)开发的代码库半自动化生成,并针对BLiMP数据集(Warstadt等人,2019b)进行了大幅扩展。
以下为任意子数据集下的原始预设数据示例:
buildoutcfg
{
"sentence1": "All ten guys that proved to boast might have been divorcing.",
"sentence2": "There are exactly ten guys that proved to boast.",
"trigger": "modal",
"presupposition": "positive",
"gold_label": "entailment",
"UID": "all_n_presupposition",
"pairID": "9e",
"paradigmID": 0
}
以下为任意子数据集下的原始会话含义数据示例:
buildoutcfg
{
"sentence1": "That teenager couldn't yell.",
"sentence2": "That teenager could yell.",
"gold_label_log": "contradiction",
"gold_label_prag": "contradiction",
"spec_relation": "negation",
"item_type": "control",
"trigger": "modal",
"lexemes": "can - have to"
}
### 数据字段
**预设(presupposition)**
预设子数据集的原始数据字段与Hugging Face数据集(Hugging Face Datasets)中的字段存在细微映射关系。在使用Hugging Face数据集接口时,字段映射规则如下:
buildoutcfg
"premise" -> "sentence1"
"hypothesis"-> "sentence2"
"trigger" -> "trigger" 或 "Not_In_Example"
"trigger1" -> "trigger1" 或 "Not_In_Example"
"trigger2" -> "trigger2" 或 "Not_In_Example"
"presupposition" -> "presupposition" 或 "Not_In_Example"
"gold_label" -> "gold_label"
"UID" -> "UID"
"pairID" -> "pairID"
"paradigmID" -> "paradigmID"
绝大多数原始字段保持不变,但针对各类`trigger`字段新增了映射规则。数据集中部分示例仅包含`trigger`字段,而其他示例则包含`trigger1`与`trigger2`字段,却无`trigger`或`presupposition`字段。
多数示例的格式与前文“数据实例”部分给出的示例一致,但偶尔也会出现如下格式的示例:
buildoutcfg
{
'sentence1': 'Did that committee know when Lissa walked through the cafe?',
'sentence2': 'That committee knew when Lissa walked through the cafe.',
'trigger1': 'interrogative',
'trigger2': 'unembedded',
'gold_label': 'neutral',
'control_item': True,
'UID': 'question_presupposition',
'pairID': '1821n',
'paradigmID': 95
}
在该示例中,出现了`trigger1`与`trigger2`字段,但移除了`presupposition`与`trigger`字段,以保持字典键的数量一致。为兼容此类示例,我们制定了上述映射规则,确保通过Hugging Face数据集接口访问的所有示例拥有相同数量的键与字段。若某示例缺少某个字段的有效值,该字段仍会保留在字典中,但其值设为`Not_In_Example`。
为便于理解,前文“数据实例”部分给出的示例在Hugging Face数据集中的格式如下:
buildoutcfg
{
"premise": "All ten guys that proved to boast might have been divorcing.",
"hypothesis": "There are exactly ten guys that proved to boast.",
"trigger": "modal",
"trigger1": "Not_In_Example",
"trigger2": "Not_In_Example",
"presupposition": "positive",
"gold_label": "entailment",
"UID": "all_n_presupposition",
"pairID": "9e",
"paradigmID": 0
}
以下为各字段的说明:
buildoutcfg
"premise": 前提语句。
"hypothesis": 假设语句。
"trigger": 触发词类型的详细讨论参见论文。
"trigger1": 触发词类型的详细讨论参见论文。
"trigger2": 触发词类型的详细讨论参见论文。
"presupposition": 取值为positive或negative。
"gold_label": 对应蕴含(entailment)、矛盾(contradiction)或中立(neutral)。
"UID": 唯一标识符。
"pairID": 语句对ID。
"paradigmID": 范式ID。
目前尚不明确`trigger`、`trigger1`与`trigger2`之间的区别,以及`paradigmID`的具体含义。
**会话含义(implicature)**
会话含义配置项下的字段仅遵循如下映射规则:
buildoutcfg
"premise" -> "sentence1"
"hypothesis"-> "sentence2"
以下为各字段的说明:
buildoutcfg
"premise": 前提语句。
"hypothesis": 假设语句。
"gold_label_log": 语句对的逻辑解读对应的金标准标签。
"gold_label_prag": 语句对的语用解读对应的金标准标签。
"spec_relation": 特定关系,未明确。
"item_type": 项目类型,未明确。
"trigger": 触发词类型的详细讨论参见论文。
"lexemes": 词项,未明确。
### 数据划分
由于本数据集旨在对已训练完成的模型进行测试,因此仅包含测试划分。
## 数据集构建
### 构建初衷
IMPPRES的构建初衷是评估已训练的NLI模型对多类预设与量级会话含义的识别能力。
### 源数据
#### 初始数据收集与标准化
[More Information Needed]
#### 源语言生产者是谁?
[More Information Needed]
### 标注
#### 标注流程
[More Information Needed]
#### 标注人员是谁?
标注通过半自动化方式生成。
### 个人与敏感信息
[More Information Needed]
## 数据集使用注意事项
### 数据集的社会影响
[More Information Needed]
### 偏差讨论
[More Information Needed]
### 其他已知局限性
[More Information Needed]
## 附加信息
### 数据集构建者
[More Information Needed]
### 许可信息
IMPPRES采用知识共享署名-非商业性使用4.0国际公共许可协议(Creative Commons Attribution-NonCommercial 4.0 International Public License,以下简称《许可协议》)进行授权。除遵守本《许可协议》外,您不得使用本数据集文件。使用数据集前请查阅LICENSE文件以获取更多详情。
### 引用信息
buildoutcfg
@inproceedings{jeretic-etal-2020-natural,
title = "Are Natural Language Inference Models {IMPPRESsive}? {L}earning {IMPlicature} and {PRESupposition}",
author = "Jeretiv{c}, Paloma and
Warstadt, Alex and
Bhooshan, Suvrat and
Williams, Adina",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.768",
doi = "10.18653/v1/2020.acl-main.768",
pages = "8690--8705",
abstract = "自然语言推理(Natural Language Inference, NLI)是自然语言理解领域日益重要的任务,要求模型推断一句语句是否蕴含另一语句。然而,NLI模型的语用推理能力仍有待深入研究。我们构建了会话含义与预设诊断数据集(IMPPRES),包含3.2万条半自动化生成的语句对,涵盖学界已被充分研究的多种语用推理类型。我们利用IMPPRES评估了在MultiNLI(Williams等人,2018)上训练的BERT、InferSent与词袋(Bag-of-Words, BOW)NLI模型的语用推理能力。尽管MultiNLI中仅包含极少量此类推理类型的语句对,我们发现BERT能够学习语用推理:它可靠地将由“some(某些)”触发的量级会话含义视为蕴含关系。对于部分预设触发词如“only(仅)”,即便触发词被嵌入在如否定这类会取消蕴含关系的算子之下,BERT仍能可靠地将预设识别为蕴含关系。BOW与InferSent的语用推理能力则较弱。我们的结论是,NLI训练会促使模型学习部分但非全部的语用推理能力。",
}
### 贡献
感谢[@aclifton314](https://github.com/aclifton314)添加本数据集。
提供机构:
maas
创建时间:
2025-05-20



