bias-amplified-splits/qqp
收藏Hugging Face2023-07-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/bias-amplified-splits/qqp
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
dataset_info:
- config_name: minority_examples
features:
- name: question1
dtype: string
- name: question2
dtype: string
- name: label
dtype:
class_label:
names:
'0': not_duplicate
'1': duplicate
- name: idx
dtype: int32
splits:
- name: train.biased
num_bytes: 42391456
num_examples: 297735
- name: train.anti_biased
num_bytes: 8509364
num_examples: 66111
- name: validation.biased
num_bytes: 4698206
num_examples: 32968
- name: validation.anti_biased
num_bytes: 955548
num_examples: 7462
download_size: 70726976
dataset_size: 56554574
- config_name: partial_input
features:
- name: question1
dtype: string
- name: question2
dtype: string
- name: label
dtype:
class_label:
names:
'0': not_duplicate
'1': duplicate
- name: idx
dtype: int32
splits:
- name: train.biased
num_bytes: 42788212
num_examples: 297735
- name: train.anti_biased
num_bytes: 8112608
num_examples: 66111
- name: validation.biased
num_bytes: 4712327
num_examples: 33084
- name: validation.anti_biased
num_bytes: 941427
num_examples: 7346
download_size: 70726976
dataset_size: 56554574
task_categories:
- text-classification
language:
- en
pretty_name: Quora Questions Pairs
---
# Dataset Card for Bias-amplified Splits for QQP
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Annotations](#annotations)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Citation Information](#citation-information)
## Dataset Description
- **Repository:** [Fighting Bias with Bias repo](https://github.com/schwartz-lab-nlp/fight-bias-with-bias)
- **Paper:** [arXiv](https://arxiv.org/abs/2305.18917)
- **Point of Contact:** [Yuval Reif](mailto:yuval.reif@mail.huji.ac.il)
- **Original Dataset's Paper:** [GLUE](https://arxiv.org/abs/1804.07461)
### Dataset Summary
Bias-amplified splits is a novel evaluation framework to assess model robustness, by amplifying dataset biases in the training data and challenging models to generalize beyond them. This framework is defined by a bias-amplified training set and a hard, anti-biased test set, which we automatically extract from existing datasets using model-based methods.
Our experiments show that the identified anti-biased examples are naturally challenging for models, and moreover, models trained on bias-amplified data exhibit dramatic performance drops on anti-biased examples, which are not mitigated by common approaches to improve generalization.
Here we apply our framework to the Quora Question Pairs dataset (QQP), a dataset composed of question pairs where the task is to determine if the questions are paraphrases of each other (have the same meaning).
Our evaluation framework can be applied to any existing dataset, even those considered obsolete, to test model robustness. We hope our work will guide the development of robust models that do not rely on superficial biases and correlations.
#### Evaluation Results (DeBERTa-large)
##### For splits based on minority examples:
| Training Data \ Test Data | Original test | Anti-biased test |
|---------------------------|---------------|------------------|
| Original training split | 93.0 | 77.6 |
| Biased training split | 87.0 | 36.8 |
##### For splits based on partial-input model:
| Training Data \ Test Data | Original test | Anti-biased test |
|---------------------------|---------------|------------------|
| Original training split | 93.0 | 81.3 |
| Biased training split | 90.3 | 63.9 |
#### Loading the Data
```
from datasets import load_dataset
# choose which bias detection method to use for the bias-amplified splits: either "minority_examples" or "partial_input"
dataset = load_dataset("bias-amplified-splits/qqp", "minority_examples")
# use the biased training split and anti-biased test split
train_dataset = dataset['train.biased']
eval_dataset = dataset['validation.anti_biased']
```
## Dataset Structure
### Data Instances
Data instances are taken directly from QQP (GLUE version), and re-split into biased and anti-biased subsets. Here is an example of an instance from the dataset:
```
{
"idx": 56,
"question1": "How do I buy used car in India?",
"question2": "Which used car should I buy in India?",
"label": 0
}
```
### Data Fields
- `idx`: unique identifier for the example within its original data splits (e.g., validation set)
- `question1`: a question asked on Quora
- `question2`: a question asked on Quora
- `label`: one of `0` and `1` (`not duplicate` and `duplicate`)
### Data Splits
Bias-amplified splits require a method to detect *biased* and *anti-biased* examples in datasets. We release bias-amplified splits based created with each of these two methods:
- **Minority examples**: A novel method we introduce that leverages representation learning and clustering for identifying anti-biased *minority examples* (Tu et al., 2020)—examples that defy common statistical patterns found in the rest of the dataset.
- **Partial-input baselines**: A common method for identifying biased examples containing annotation artifacts in a dataset, which examines the performance of models that are restricted to using only part of the input. Such models, if successful, are bound to rely on unintended or spurious patterns in the dataset.
Using each of the two methods, we split each of the original train and test splits into biased and anti-biased subsets. See the [paper](https://arxiv.org/abs/2305.18917) for more details.
#### Minority Examples
| Dataset Split | Number of Instances in Split |
|--------------------------|------------------------------|
| Train - biased | 297735 |
| Train - anti-biased | 66111 |
| Validation - biased | 32968 |
| Validation - anti-biased | 7462 |
#### Partial-input Baselines
| Dataset Split | Number of Instances in Split |
|--------------------------|------------------------------|
| Train - biased | 297735 |
| Train - anti-biased | 66111 |
| Validation - biased | 33084 |
| Validation - anti-biased | 7346 |
## Dataset Creation
### Curation Rationale
NLP models often rely on superficial cues known as *dataset biases* to achieve impressive performance, and can fail on examples where these biases do not hold. To develop more robust, unbiased models, recent work aims to filter bisased examples from training sets. We argue that in order to encourage the development of robust models, we should in fact **amplify** biases in the training sets, while adopting the challenge set approach and making test sets anti-biased. To implement our approach, we introduce a simple framework that can be applied automatically to any existing dataset to use it for testing model robustness.
### Annotations
#### Annotation process
No new annotations are required to create bias-amplified splits. Existing data instances are split into *biased* and *anti-biased* splits based on automatic model-based methods to detect such examples.
## Considerations for Using the Data
### Social Impact of Dataset
Bias-amplified splits were created to promote the development of robust NLP models that do not rely on superficial biases and correlations, and provide more challenging evaluation of existing systems.
### Discussion of Biases
We propose to use bias-amplified splits to complement benchmarks with challenging evaluation settings that test model robustness, in addition to the dataset’s main training and test sets. As such, while existing dataset biases are *amplified* during training with bias-amplified splits, these splits are intended primarily for model evaluation, to expose the bias-exploiting behaviors of models and to identify more robsut models and effective robustness interventions.
## Additional Information
### Dataset Curators
Bias-amplified splits were introduced by Yuval Reif and Roy Schwartz from the [Hebrew University of Jerusalem](https://schwartz-lab-huji.github.io).
QQP data was released by Quora and released under the GLUE benchmark.
### Citation Information
```
@misc{reif2023fighting,
title = "Fighting Bias with Bias: Promoting Model Robustness by Amplifying Dataset Biases",
author = "Yuval Reif and Roy Schwartz",
month = may,
year = "2023",
url = "https://arxiv.org/pdf/2305.18917",
}
```
Source dataset:
```
@inproceedings{wang2019glue,
title={{GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
note={In the Proceedings of ICLR.},
year={2019}
}
```
提供机构:
bias-amplified-splits
原始信息汇总
数据集卡片 for Bias-amplified Splits for QQP
数据集描述
数据集摘要
Bias-amplified splits 是一种新颖的评估框架,用于通过在训练数据中放大数据集偏差并挑战模型以超越它们来评估模型的鲁棒性。该框架由一个偏差放大的训练集和一个困难的反偏差测试集定义,我们使用基于模型的方法从现有数据集中自动提取这些集。
我们的实验表明,识别的反偏差示例对模型来说是自然具有挑战性的,而且,在偏差放大数据上训练的模型在反偏差示例上表现出显著的性能下降,这些下降并未通过常见的提高泛化的方法得到缓解。
在这里,我们将我们的框架应用于Quora问题对数据集(QQP),该数据集由问题对组成,任务是确定问题是否是彼此的释义(具有相同的含义)。
我们的评估框架可以应用于任何现有数据集,甚至是那些被认为是过时的数据集,以测试模型的鲁棒性。我们希望我们的工作将指导开发不依赖于表面偏差和相关性的鲁棒模型。
数据集结构
数据实例
数据实例直接来自QQP(GLUE版本),并重新划分为偏差和反偏差子集。以下是数据集中的一个实例示例: json { "idx": 56, "question1": "How do I buy used car in India?", "question2": "Which used car should I buy in India?", "label": 0 }
数据字段
idx: 示例在其原始数据划分中的唯一标识符(例如,验证集)question1: 在Quora上提出的一个问题question2: 在Quora上提出的一个问题label: 其中之一0和1(not duplicate和duplicate)
数据划分
偏差放大划分需要一种方法来检测数据集中的偏差和反偏差示例。我们发布了使用以下两种方法创建的偏差放大划分:
- 少数示例:我们引入的一种新方法,利用表示学习和聚类来识别反偏差的少数示例(Tu et al., 2020)——那些违背数据集中常见统计模式的示例。
- 部分输入基线:一种常见的方法,用于识别包含数据集中注释伪影的偏差示例,该方法检查仅使用部分输入的模型的性能。如果这些模型成功,它们必然依赖于数据集中的意外或虚假模式。
使用每种方法,我们将原始的训练和测试划分分成偏差和反偏差子集。有关更多详细信息,请参见论文。
少数示例
| 数据集划分 | 划分中的实例数量 |
|---|---|
| Train - biased | 297735 |
| Train - anti-biased | 66111 |
| Validation - biased | 32968 |
| Validation - anti-biased | 7462 |
部分输入基线
| 数据集划分 | 划分中的实例数量 |
|---|---|
| Train - biased | 297735 |
| Train - anti-biased | 66111 |
| Validation - biased | 33084 |
| Validation - anti-biased | 7346 |



