five

pietrolesci/robust_nli

收藏
Hugging Face2022-04-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/pietrolesci/robust_nli
下载链接
链接失效反馈
官方服务:
资源简介:
## Overview Original dataset is available in the original [Github repo](https://github.com/tyliupku/nli-debiasing-datasets). This dataset is a collection of NLI benchmarks constructed as described in the paper [An Empirical Study on Model-agnostic Debiasing Strategies for Robust Natural Language Inference](https://aclanthology.org/2020.conll-1.48/) published at CoNLL 2020. ## Dataset curation No specific curation for this dataset. Label encoding follows exactly what is reported in the paper by the authors. Also, from the paper: > _all the following datasets are collected based on the public available resources proposed by their authors, thus the experimental results in this paper are comparable to the numbers reported in the original papers and the other papers that use these datasets_ Most of the datasets included follow the custom 3-class NLI convention `{"entailment": 0, "neutral": 1, "contradiction": 2}`. However, the following datasets have a particular label mapping - `IS-SD`: `{"non-entailment": 0, "entailment": 1}` - `LI_TS`: `{"non-contradiction": 0, "contradiction": 1}` ## Dataset structure This benchmark dataset includes 10 adversarial datasets. To provide more insights on how the adversarial datasets attack the models, the authors categorized them according to the bias(es) they test and they renamed them accordingly. More details in section 2 of the paper. A mapping with the original dataset names is provided below | | Name | Original Name | Original Paper | Original Curation | |---:|:-------|:-----------------------|:--------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | PI-CD | SNLI-Hard | [Gururangan et al. (2018)](https://aclanthology.org/N18-2017/) | SNLI test sets instances that cannot be correctly classified by a neural classifier (fastText) trained on only the hypothesis sentences. | | 1 | PI-SP | MNLI-Hard | [Liu et al. (2020)](https://aclanthology.org/2020.lrec-1.846/) | MNLI-mismatched dev sets instances that cannot be correctly classified by surface patterns that are highly correlated with the labels. | | 2 | IS-SD | HANS | [McCoy et al. (2019)](https://aclanthology.org/P19-1334/) | Dataset that tests lexical overlap, subsequence, and constituent heuristics between the hypothesis and premises sentences. | | 3 | IS-CS | SoSwap-AddAMod | [Nie et al. (2019)](https://dl.acm.org/doi/abs/10.1609/aaai.v33i01.33016867) | Pairs of sentences whose logical relations cannot be extracted from lexical information alone. Premise are taken from SNLI dev set and modified. The original paper assigns a Lexically Misleading Scores (LMS) to each instance. Here, only the subset with LMS > 0.7 is reported. | | 4 | LI-LI | Stress tests (antonym) | [Naik et al. (2018)](https://aclanthology.org/C18-1198/) and [Glockner et al. (2018)](https://aclanthology.org/P18-2103/) | Merge of the 'antonym' category in Naik et al. (2018) (from MNLI matched and mismatched dev sets) and Glockner et al. (2018) (SNLI training set). | | 5 | LI-TS | Created by the authors | Created by the authors | Swap the two sentences in the original MultiNLI mismatched dev sets. If the gold label is 'contradiction', the corresponding label in the swapped instance remains unchanged, otherwise it becomes 'non-contradicted'. | | 6 | ST-WO | Word overlap | [Naik et al. (2018)](https://aclanthology.org/C18-1198/) | 'Word overlap' category in Naik et al. (2018). | | 7 | ST-NE | Negation | [Naik et al. (2018)](https://aclanthology.org/C18-1198/) | 'Negation' category in Naik et al. (2018). | | 8 | ST-LM | Length mismatch | [Naik et al. (2018)](https://aclanthology.org/C18-1198/) | 'Length mismatch' category in Naik et al. (2018). | | 9 | ST-SE | Spelling errors | [Naik et al. (2018)](https://aclanthology.org/C18-1198/) | 'Spelling errors' category in Naik et al. (2018). | ## Code to create the dataset ```python import pandas as pd from datasets import Dataset, ClassLabel, Value, Features, DatasetDict Tri_dataset = ["IS_CS", "LI_LI", "PI_CD", "PI_SP", "ST_LM", "ST_NE", "ST_SE", "ST_WO"] Ent_bin_dataset = ["IS_SD"] Con_bin_dataset = ["LI_TS"] # read data with open("<path to file>/robust_nli.txt", encoding="utf-8", mode="r") as fl: f = fl.read().strip().split("\n") f = [eval(i) for i in f] df = pd.DataFrame.from_dict(f) # rename to map common names df = df.rename(columns={"prem": "premise", "hypo": "hypothesis"}) # reorder columns df = df.loc[:, ["idx", "split", "premise", "hypothesis", "label"]] # create split-specific features Tri_features = Features( { "idx": Value(dtype="int64"), "premise": Value(dtype="string"), "hypothesis": Value(dtype="string"), "label": ClassLabel(num_classes=3, names=["entailment", "neutral", "contradiction"]), } ) Ent_features = Features( { "idx": Value(dtype="int64"), "premise": Value(dtype="string"), "hypothesis": Value(dtype="string"), "label": ClassLabel(num_classes=2, names=["non-entailment", "entailment"]), } ) Con_features = Features( { "idx": Value(dtype="int64"), "premise": Value(dtype="string"), "hypothesis": Value(dtype="string"), "label": ClassLabel(num_classes=2, names=["non-contradiction", "contradiction"]), } ) # convert to datasets dataset_splits = {} for split in df["split"].unique(): print(split) df_split = df.loc[df["split"] == split].copy() if split in Tri_dataset: df_split["label"] = df_split["label"].map({"entailment": 0, "neutral": 1, "contradiction": 2}) ds = Dataset.from_pandas(df_split, features=Tri_features) elif split in Ent_bin_dataset: df_split["label"] = df_split["label"].map({"non-entailment": 0, "entailment": 1}) ds = Dataset.from_pandas(df_split, features=Ent_features) elif split in Con_bin_dataset: df_split["label"] = df_split["label"].map({"non-contradiction": 0, "contradiction": 1}) ds = Dataset.from_pandas(df_split, features=Con_features) else: print("ERROR:", split) dataset_splits[split] = ds datasets = DatasetDict(dataset_splits) datasets.push_to_hub("pietrolesci/robust_nli", token="<your token>") # check overlap between splits from itertools import combinations for i, j in combinations(datasets.keys(), 2): print( f"{i} - {j}: ", pd.merge( datasets[i].to_pandas(), datasets[j].to_pandas(), on=["premise", "hypothesis", "label"], how="inner", ).shape[0], ) #> PI_SP - ST_LM: 0 #> PI_SP - ST_NE: 0 #> PI_SP - IS_CS: 0 #> PI_SP - LI_TS: 1 #> PI_SP - LI_LI: 0 #> PI_SP - ST_SE: 0 #> PI_SP - PI_CD: 0 #> PI_SP - IS_SD: 0 #> PI_SP - ST_WO: 0 #> ST_LM - ST_NE: 0 #> ST_LM - IS_CS: 0 #> ST_LM - LI_TS: 0 #> ST_LM - LI_LI: 0 #> ST_LM - ST_SE: 0 #> ST_LM - PI_CD: 0 #> ST_LM - IS_SD: 0 #> ST_LM - ST_WO: 0 #> ST_NE - IS_CS: 0 #> ST_NE - LI_TS: 0 #> ST_NE - LI_LI: 0 #> ST_NE - ST_SE: 0 #> ST_NE - PI_CD: 0 #> ST_NE - IS_SD: 0 #> ST_NE - ST_WO: 0 #> IS_CS - LI_TS: 0 #> IS_CS - LI_LI: 0 #> IS_CS - ST_SE: 0 #> IS_CS - PI_CD: 0 #> IS_CS - IS_SD: 0 #> IS_CS - ST_WO: 0 #> LI_TS - LI_LI: 0 #> LI_TS - ST_SE: 0 #> LI_TS - PI_CD: 0 #> LI_TS - IS_SD: 0 #> LI_TS - ST_WO: 0 #> LI_LI - ST_SE: 0 #> LI_LI - PI_CD: 0 #> LI_LI - IS_SD: 0 #> LI_LI - ST_WO: 0 #> ST_SE - PI_CD: 0 #> ST_SE - IS_SD: 0 #> ST_SE - ST_WO: 0 #> PI_CD - IS_SD: 0 #> PI_CD - ST_WO: 0 #> IS_SD - ST_WO: 0 ```
提供机构:
pietrolesci
原始信息汇总

数据集概述

数据集整理

  • 标签编码: 遵循论文中作者报告的方式。
  • 特殊处理: 无特定整理,所有数据集基于原作者提供的公共资源收集。
  • 标签映射:
    • IS-SD: {"non-entailment": 0, "entailment": 1}
    • LI_TS: {"non-contradiction": 0, "contradiction": 1}

数据集结构

  • 组成: 包含10个对抗性数据集,根据测试的偏差类型重新命名。
  • 详细映射:
    索引 名称 原始名称 原始论文 原始整理方法
    0 PI-CD SNLI-Hard Gururangan et al. (2018) SNLI测试集实例,不能被仅训练在假设句子上的神经分类器(fastText)正确分类。
    1 PI-SP MNLI-Hard Liu et al. (2020) MNLI-mismatched开发集实例,不能被与标签高度相关的表面模式正确分类。
    2 IS-SD HANS McCoy et al. (2019) 测试假设和前提句子之间的词汇重叠、子序列和成分启发式的数据集。
    3 IS-CS SoSwap-AddAMod Nie et al. (2019) 句子对,其逻辑关系不能仅从词汇信息中提取。前提来自SNLI开发集并被修改。原始论文为每个实例分配了一个Lexically Misleading Scores (LMS),这里只报告LMS > 0.7的子集。
    4 LI-LI Stress tests (antonym) Naik et al. (2018) and Glockner et al. (2018) Naik et al. (2018)中的antonym类别(来自MNLI匹配和非匹配开发集)和Glockner et al. (2018) (SNLI训练集)的合并。
    5 LI-TS 作者创建 作者创建 在原始MultiNLI非匹配开发集中交换两个句子。如果黄金标签是contradiction,则交换实例中的相应标签保持不变,否则变为non-contradicted。
    6 ST-WO Word overlap Naik et al. (2018) Naik et al. (2018)中的Word overlap类别。
    7 ST-NE Negation Naik et al. (2018) Naik et al. (2018)中的Negation类别。
    8 ST-LM Length mismatch Naik et al. (2018) Naik et al. (2018)中的Length mismatch类别。
    9 ST-SE Spelling errors Naik et al. (2018) Naik et al. (2018)中的Spelling errors类别。

数据集创建代码

  • 数据处理: 使用Python和Pandas库从原始数据文件中读取数据,重命名列,并根据不同的数据集类型(三分类、二分类)创建不同的特征定义。
  • 数据集分割: 根据数据集的类型和分割,将数据转换为Dataset对象,并上传到Hugging Face Hub。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作