vs16/counter-hate-dataset
收藏Hugging Face2026-04-02 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/vs16/counter-hate-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: original_sample_id
dtype: string
- name: counterfactual_id
dtype: string
- name: text
dtype: string
- name: class_label
dtype: string
- name: target_group
dtype: string
- name: polarity
dtype: string
- name: hate_score
dtype: float32
- name: confidence
dtype: float32
- name: cf_type
dtype: string
- name: t2i_prompt
dtype: string
- name: image_path
dtype: string
- name: split
dtype: string
splits:
- name: train
num_bytes: 4624365
num_examples: 12597
- name: validation
num_bytes: 1007932
num_examples: 2700
- name: test
num_bytes: 1010867
num_examples: 2703
license: cc-by-4.0
size_gb: 2.3
---
# Counter-Hate Dataset
A large-scale multimodal dataset for studying fairness and bias in hate speech detection systems with counterfactual augmentation.
**🎯 Target venue:** ACM Multimedia 2026
## Dataset Description
This dataset contains **18,000 text-image pairs** categorized into 8 hate speech classes with varying levels of protected group representation. The dataset was created to evaluate whether Counterfactual Data Augmentation (CDA) introduces or amplifies bias in hate speech detection models.
### Key Features:
- **Multimodal**: Text + Generated Images (from T2I models like Z-Image-Turbo)
- **18,000 samples**: 6,000 original texts + 12,000 counterfactual variants via identity-term substitution
- **8 hate speech classes**:
- hate_race, hate_religion, hate_gender, hate_other
- offensive_non_hate, neutral_discussion, counter_speech, ambiguous
- **8 protected groups**:
- race/ethnicity, religion, gender, sexual_orientation, national_origin/citizenship, disability, age, multiple/none
- **Stratified splits: train/val/test** (70% / 15% / 15%) with originals only in val/test
- **Experimental conditions**: nCF (6K originals) and CF (18K with counterfactuals)
## Dataset Structure
```
counter-hate-dataset/
├── data/
│ ├── train.csv # 12,597 training samples (includes counterfactuals)
│ ├── validation.csv # 2,700 validation samples (originals only)
│ ├── test.csv # 2,703 test samples (originals only)
│ └── dataset.csv # Combined all samples
├── images/
│ ├── hate/ # 9,000 images for hate speech samples
│ └── non_hate/ # 9,000 images for non-hate samples
└── README.md # This file
```
## Features
Each row in the CSV files contains:
| Column | Type | Description |
|--------|------|-------------|
| `original_sample_id` | str | Unique identifier for the original sample (e.g., HS_HATE_RACE_0001) |
| `counterfactual_id` | str | Unique ID for this variant (original or counterfactual_X) |
| `text` | str | The actual text content |
| `class_label` | str | One of 8 hate speech categories |
| `target_group` | str | Protected group mentioned in text (8 groups) |
| `polarity` | str | 'hate', 'non-hate' |
| `hate_score` | float | Numeric hate score from annotations |
| `confidence` | float | Confidence in the annotation (0-10 scale) |
| `cf_type` | str | 'original' or 'counterfactual_1', 'counterfactual_2', etc. |
| `t2i_prompt` | str | Text-to-image generation prompt |
| `image_path` | str | Relative path to corresponding PNG image |
| `split` | str | 'train', 'validation', or 'test' |
## Splits
Training conditions and split statistics:
### Condition nCF (No Counterfactual)
- Train: 4,158 original samples
- Validation: 891 original samples
- Test: 892 original samples
- **Total: 5,841 samples**
### Condition CF (With Counterfactual)
- Train: 12,597 samples (includes counterfactuals)
- Validation: 2,700 samples (originals only)
- Test: 2,703 samples (originals only)
- **Total: 18,000 samples**
All splits are stratified by `class_label` to preserve class distribution.
## Class Distribution
Each class contains 750 original samples:
- **Hate Speech (4 classes)**:
- hate_race: 750
- hate_religion: 750
- hate_gender: 750
- hate_other: 750
- **Non-Hate (4 classes)**:
- offensive_non_hate: 750
- neutral_discussion: 750
- counter_speech: 750
- ambiguous: 750
## Image Generation
Images were generated using the **Z-Image-Turbo** text-to-image model with prompts derived from the text content. Each image corresponds to a text sample and represents the hate speech category and context. Images are stored as PNG files organized by category.
### Image Organization
```
images/
├── hate/
│ ├── Hate_Gender/ # Gender-targeted hate speech
│ ├── Hate_Others/ # Other hate categories
│ ├── Hate_race/ # Race-targeted hate speech
│ └── ...
└── non_hate/
├── generated_images-ambigious/
├── generated_images-counter-speech/
├── generated_images-neutral/
└── ...
```
## Usage
### Load with Hugging Face Datasets
```python
from datasets import load_dataset
# Load the full dataset
dataset = load_dataset('vs16/counter-hate-dataset')
# Access specific split
train_data = dataset['train']
val_data = dataset['validation']
test_data = dataset['test']
# Access a sample
sample = train_data[0]
print(sample['text'])
print(sample['image_path'])
```
### Load with Pandas
```python
import pandas as pd
from PIL import Image
# Load a specific split
train_df = pd.read_csv('data/train.csv')
# Access a row
sample = train_df.iloc[0]
print(sample['text'])
print(sample['class_label'])
# Load the image
image = Image.open(f"images/{sample['image_path']}")
image.show()
```
### Stratified Train/Val/Test Split
```python
import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv('data/dataset.csv')
# Using the provided split column
train_df = df[df['split'] == 'train']
val_df = df[df['split'] == 'validation']
test_df = df[df['split'] == 'test']
```
## Experimental Methodology
### Counterfactual Generation
Counterfactual samples were generated via **identity-term substitution**:
1. Extract identity terms (group descriptors) from original text
2. Replace with alternative terms for the same attribute dimension
3. Examples:
- "Muslim" → "Christian", "Jewish", "Hindu", etc.
- "Black" → "Asian", "Hispanic", "Native American", etc.
- "woman" → "man", etc.
### Data Preparation (Conditions)
- **nCF (No Counterfactual)**: Only original 6,000 samples
- **CF (With Counterfactual)**: 6,000 originals + 12,000 counterfactual variants
### Fair Evaluation Protocol
- Validation and test sets contain **only original samples** for both conditions
- Training set includes counterfactuals in CF condition
- Prevents data leakage and ensures comparable evaluation
## Citation
If you use this dataset in your research, please cite:
```bibtex
@dataset{vs16_counter_hate_2026,
title={Counter-Hate Dataset: A Multimodal Benchmark for Studying Fairness and Bias in Hate Speech Detection},
year={2026},
publisher={Hugging Face Datasets},
url={https://huggingface.co/datasets/vs16/counter-hate-dataset}
}
```
And the original work:
```bibtex
@article{kennedy2020measuring,
title={Measuring the Reliability of Hate Speech Annotations: The Case of the European Parliament Debates},
author={Kennedy, Bing and Atkinson, David and others},
year={2020}
}
```
## License
The dataset is provided for **research purposes only**. Users must comply with applicable laws and ethical guidelines when working with this data.
## Ethical Considerations
⚠️ **Important Note**: This dataset contains hate speech and offensive language for research purposes only.
### Responsible Use Guidelines
1. **Research Purpose Only**: Use solely for studying bias, fairness, and detection systems
2. **Do Not Amplify**: Do not use to train systems that amplify or spread hate speech
3. **Sensitivity**: Be aware of the sensitive nature of the content
4. **Attribution**: Always cite and credit the dataset source
5. **Report Issues**: Report any misuse or ethical concerns
6. **Institutional Review**: Consider IRB approval for related human studies
### Protected Groups Representation
The dataset explicitly includes diverse protected groups to ensure comprehensive bias evaluation. This is intentional and necessary for fairness research.
## Dataset Statistics
- **Total samples**: 18,000 text-image pairs
- **Original samples**: 6,000
- **Counterfactual variants**: 12,000
- **Image files**: 18,000 PNG images
- **Total size**: ~2.3 GB (with images)
- **CSV size**: ~8 MB (combined)
- **Average text length**: 150-300 characters
- **Image resolution**: 512x512px (typical for T2I models)
## Reproducibility
All splits and train/val/test assignments are deterministic and reproducible:
- **Random seed**: 42
- **Stratification**: By class_label (8 classes)
- **Split level**: By original_sample_id (group-level splitting)
- **Canonical source**: `canonical_splits.json` defines all splits
## Known Limitations
1. Images are synthetically generated and may not perfectly represent real-world scenarios
2. Counterfactual generation via term substitution is limited to identity/demographic attributes
3. Limited to English text
4. Hate speech prevalence is intentionally high (for detection research)
5. May not generalize to all hate speech types or contexts
## Questions & Support
For issues, questions, or suggestions:
- 🐛 Dataset issues: Report on GitHub
- 📧 Contact: via Hugging Face dataset page
- 💬 Discussions: Use the Hugging Face Discussions tab
## Related Work
This dataset builds on and relates to:
- UCBerkeley-DLab's "Measuring Hate Speech" dataset
- HateBERT and other hate speech detection models
- Fairness and bias research in NLP
- Counterfactual data augmentation literature
---
**Dataset Version**: 1.0
**Last Updated**: April 2, 2026
**Status**: Ready for research use
提供机构:
vs16



