pietrolesci/robust_nli
收藏Hugging Face2022-04-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/pietrolesci/robust_nli
下载链接
链接失效反馈官方服务:
资源简介:
## Overview
Original dataset is available in the original [Github repo](https://github.com/tyliupku/nli-debiasing-datasets).
This dataset is a collection of NLI benchmarks constructed as described in the paper
[An Empirical Study on Model-agnostic Debiasing Strategies for Robust Natural Language Inference](https://aclanthology.org/2020.conll-1.48/)
published at CoNLL 2020.
## Dataset curation
No specific curation for this dataset. Label encoding follows exactly what is reported in the paper by the authors.
Also, from the paper:
> _all the following datasets are collected based on the public available resources proposed by their authors, thus the experimental results in this paper are comparable to the numbers reported in the original papers and the other papers that use these datasets_
Most of the datasets included follow the custom 3-class NLI convention `{"entailment": 0, "neutral": 1, "contradiction": 2}`.
However, the following datasets have a particular label mapping
- `IS-SD`: `{"non-entailment": 0, "entailment": 1}`
- `LI_TS`: `{"non-contradiction": 0, "contradiction": 1}`
## Dataset structure
This benchmark dataset includes 10 adversarial datasets. To provide more insights on how the adversarial
datasets attack the models, the authors categorized them according to the bias(es) they test and they renamed
them accordingly. More details in section 2 of the paper.
A mapping with the original dataset names is provided below
| | Name | Original Name | Original Paper | Original Curation |
|---:|:-------|:-----------------------|:--------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | PI-CD | SNLI-Hard | [Gururangan et al. (2018)](https://aclanthology.org/N18-2017/) | SNLI test sets instances that cannot be correctly classified by a neural classifier (fastText) trained on only the hypothesis sentences. |
| 1 | PI-SP | MNLI-Hard | [Liu et al. (2020)](https://aclanthology.org/2020.lrec-1.846/) | MNLI-mismatched dev sets instances that cannot be correctly classified by surface patterns that are highly correlated with the labels. |
| 2 | IS-SD | HANS | [McCoy et al. (2019)](https://aclanthology.org/P19-1334/) | Dataset that tests lexical overlap, subsequence, and constituent heuristics between the hypothesis and premises sentences. |
| 3 | IS-CS | SoSwap-AddAMod | [Nie et al. (2019)](https://dl.acm.org/doi/abs/10.1609/aaai.v33i01.33016867) | Pairs of sentences whose logical relations cannot be extracted from lexical information alone. Premise are taken from SNLI dev set and modified. The original paper assigns a Lexically Misleading Scores (LMS) to each instance. Here, only the subset with LMS > 0.7 is reported. |
| 4 | LI-LI | Stress tests (antonym) | [Naik et al. (2018)](https://aclanthology.org/C18-1198/) and [Glockner et al. (2018)](https://aclanthology.org/P18-2103/) | Merge of the 'antonym' category in Naik et al. (2018) (from MNLI matched and mismatched dev sets) and Glockner et al. (2018) (SNLI training set). |
| 5 | LI-TS | Created by the authors | Created by the authors | Swap the two sentences in the original MultiNLI mismatched dev sets. If the gold label is 'contradiction', the corresponding label in the swapped instance remains unchanged, otherwise it becomes 'non-contradicted'. |
| 6 | ST-WO | Word overlap | [Naik et al. (2018)](https://aclanthology.org/C18-1198/) | 'Word overlap' category in Naik et al. (2018). |
| 7 | ST-NE | Negation | [Naik et al. (2018)](https://aclanthology.org/C18-1198/) | 'Negation' category in Naik et al. (2018). |
| 8 | ST-LM | Length mismatch | [Naik et al. (2018)](https://aclanthology.org/C18-1198/) | 'Length mismatch' category in Naik et al. (2018). |
| 9 | ST-SE | Spelling errors | [Naik et al. (2018)](https://aclanthology.org/C18-1198/) | 'Spelling errors' category in Naik et al. (2018). |
## Code to create the dataset
```python
import pandas as pd
from datasets import Dataset, ClassLabel, Value, Features, DatasetDict
Tri_dataset = ["IS_CS", "LI_LI", "PI_CD", "PI_SP", "ST_LM", "ST_NE", "ST_SE", "ST_WO"]
Ent_bin_dataset = ["IS_SD"]
Con_bin_dataset = ["LI_TS"]
# read data
with open("<path to file>/robust_nli.txt", encoding="utf-8", mode="r") as fl:
f = fl.read().strip().split("\n")
f = [eval(i) for i in f]
df = pd.DataFrame.from_dict(f)
# rename to map common names
df = df.rename(columns={"prem": "premise", "hypo": "hypothesis"})
# reorder columns
df = df.loc[:, ["idx", "split", "premise", "hypothesis", "label"]]
# create split-specific features
Tri_features = Features(
{
"idx": Value(dtype="int64"),
"premise": Value(dtype="string"),
"hypothesis": Value(dtype="string"),
"label": ClassLabel(num_classes=3, names=["entailment", "neutral", "contradiction"]),
}
)
Ent_features = Features(
{
"idx": Value(dtype="int64"),
"premise": Value(dtype="string"),
"hypothesis": Value(dtype="string"),
"label": ClassLabel(num_classes=2, names=["non-entailment", "entailment"]),
}
)
Con_features = Features(
{
"idx": Value(dtype="int64"),
"premise": Value(dtype="string"),
"hypothesis": Value(dtype="string"),
"label": ClassLabel(num_classes=2, names=["non-contradiction", "contradiction"]),
}
)
# convert to datasets
dataset_splits = {}
for split in df["split"].unique():
print(split)
df_split = df.loc[df["split"] == split].copy()
if split in Tri_dataset:
df_split["label"] = df_split["label"].map({"entailment": 0, "neutral": 1, "contradiction": 2})
ds = Dataset.from_pandas(df_split, features=Tri_features)
elif split in Ent_bin_dataset:
df_split["label"] = df_split["label"].map({"non-entailment": 0, "entailment": 1})
ds = Dataset.from_pandas(df_split, features=Ent_features)
elif split in Con_bin_dataset:
df_split["label"] = df_split["label"].map({"non-contradiction": 0, "contradiction": 1})
ds = Dataset.from_pandas(df_split, features=Con_features)
else:
print("ERROR:", split)
dataset_splits[split] = ds
datasets = DatasetDict(dataset_splits)
datasets.push_to_hub("pietrolesci/robust_nli", token="<your token>")
# check overlap between splits
from itertools import combinations
for i, j in combinations(datasets.keys(), 2):
print(
f"{i} - {j}: ",
pd.merge(
datasets[i].to_pandas(),
datasets[j].to_pandas(),
on=["premise", "hypothesis", "label"],
how="inner",
).shape[0],
)
#> PI_SP - ST_LM: 0
#> PI_SP - ST_NE: 0
#> PI_SP - IS_CS: 0
#> PI_SP - LI_TS: 1
#> PI_SP - LI_LI: 0
#> PI_SP - ST_SE: 0
#> PI_SP - PI_CD: 0
#> PI_SP - IS_SD: 0
#> PI_SP - ST_WO: 0
#> ST_LM - ST_NE: 0
#> ST_LM - IS_CS: 0
#> ST_LM - LI_TS: 0
#> ST_LM - LI_LI: 0
#> ST_LM - ST_SE: 0
#> ST_LM - PI_CD: 0
#> ST_LM - IS_SD: 0
#> ST_LM - ST_WO: 0
#> ST_NE - IS_CS: 0
#> ST_NE - LI_TS: 0
#> ST_NE - LI_LI: 0
#> ST_NE - ST_SE: 0
#> ST_NE - PI_CD: 0
#> ST_NE - IS_SD: 0
#> ST_NE - ST_WO: 0
#> IS_CS - LI_TS: 0
#> IS_CS - LI_LI: 0
#> IS_CS - ST_SE: 0
#> IS_CS - PI_CD: 0
#> IS_CS - IS_SD: 0
#> IS_CS - ST_WO: 0
#> LI_TS - LI_LI: 0
#> LI_TS - ST_SE: 0
#> LI_TS - PI_CD: 0
#> LI_TS - IS_SD: 0
#> LI_TS - ST_WO: 0
#> LI_LI - ST_SE: 0
#> LI_LI - PI_CD: 0
#> LI_LI - IS_SD: 0
#> LI_LI - ST_WO: 0
#> ST_SE - PI_CD: 0
#> ST_SE - IS_SD: 0
#> ST_SE - ST_WO: 0
#> PI_CD - IS_SD: 0
#> PI_CD - ST_WO: 0
#> IS_SD - ST_WO: 0
```
提供机构:
pietrolesci
原始信息汇总
数据集概述
- 来源: 原始数据集来自Github仓库。
- 目的: 该数据集是为了支持论文An Empirical Study on Model-agnostic Debiasing Strategies for Robust Natural Language Inference的研究,该论文在CoNLL 2020上发表。
数据集整理
- 标签编码: 遵循论文中作者报告的方式。
- 特殊处理: 无特定整理,所有数据集基于原作者提供的公共资源收集。
- 标签映射:
IS-SD:{"non-entailment": 0, "entailment": 1}LI_TS:{"non-contradiction": 0, "contradiction": 1}
数据集结构
- 组成: 包含10个对抗性数据集,根据测试的偏差类型重新命名。
- 详细映射:
索引 名称 原始名称 原始论文 原始整理方法 0 PI-CD SNLI-Hard Gururangan et al. (2018) SNLI测试集实例,不能被仅训练在假设句子上的神经分类器(fastText)正确分类。 1 PI-SP MNLI-Hard Liu et al. (2020) MNLI-mismatched开发集实例,不能被与标签高度相关的表面模式正确分类。 2 IS-SD HANS McCoy et al. (2019) 测试假设和前提句子之间的词汇重叠、子序列和成分启发式的数据集。 3 IS-CS SoSwap-AddAMod Nie et al. (2019) 句子对,其逻辑关系不能仅从词汇信息中提取。前提来自SNLI开发集并被修改。原始论文为每个实例分配了一个Lexically Misleading Scores (LMS),这里只报告LMS > 0.7的子集。 4 LI-LI Stress tests (antonym) Naik et al. (2018) and Glockner et al. (2018) Naik et al. (2018)中的antonym类别(来自MNLI匹配和非匹配开发集)和Glockner et al. (2018) (SNLI训练集)的合并。 5 LI-TS 作者创建 作者创建 在原始MultiNLI非匹配开发集中交换两个句子。如果黄金标签是contradiction,则交换实例中的相应标签保持不变,否则变为non-contradicted。 6 ST-WO Word overlap Naik et al. (2018) Naik et al. (2018)中的Word overlap类别。 7 ST-NE Negation Naik et al. (2018) Naik et al. (2018)中的Negation类别。 8 ST-LM Length mismatch Naik et al. (2018) Naik et al. (2018)中的Length mismatch类别。 9 ST-SE Spelling errors Naik et al. (2018) Naik et al. (2018)中的Spelling errors类别。
数据集创建代码
- 数据处理: 使用Python和Pandas库从原始数据文件中读取数据,重命名列,并根据不同的数据集类型(三分类、二分类)创建不同的特征定义。
- 数据集分割: 根据数据集的类型和分割,将数据转换为Dataset对象,并上传到Hugging Face Hub。



