vs16/counter-hate-dataset

Name: vs16/counter-hate-dataset
Creator: vs16
Published: 2026-04-02 09:52:52
License: 暂无描述

Hugging Face2026-04-02 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/vs16/counter-hate-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: original_sample_id dtype: string - name: counterfactual_id dtype: string - name: text dtype: string - name: class_label dtype: string - name: target_group dtype: string - name: polarity dtype: string - name: hate_score dtype: float32 - name: confidence dtype: float32 - name: cf_type dtype: string - name: t2i_prompt dtype: string - name: image_path dtype: string - name: split dtype: string splits: - name: train num_bytes: 4624365 num_examples: 12597 - name: validation num_bytes: 1007932 num_examples: 2700 - name: test num_bytes: 1010867 num_examples: 2703 license: cc-by-4.0 size_gb: 2.3 --- # Counter-Hate Dataset A large-scale multimodal dataset for studying fairness and bias in hate speech detection systems with counterfactual augmentation. **🎯 Target venue:** ACM Multimedia 2026 ## Dataset Description This dataset contains **18,000 text-image pairs** categorized into 8 hate speech classes with varying levels of protected group representation. The dataset was created to evaluate whether Counterfactual Data Augmentation (CDA) introduces or amplifies bias in hate speech detection models. ### Key Features: - **Multimodal**: Text + Generated Images (from T2I models like Z-Image-Turbo) - **18,000 samples**: 6,000 original texts + 12,000 counterfactual variants via identity-term substitution - **8 hate speech classes**: - hate_race, hate_religion, hate_gender, hate_other - offensive_non_hate, neutral_discussion, counter_speech, ambiguous - **8 protected groups**: - race/ethnicity, religion, gender, sexual_orientation, national_origin/citizenship, disability, age, multiple/none - **Stratified splits: train/val/test** (70% / 15% / 15%) with originals only in val/test - **Experimental conditions**: nCF (6K originals) and CF (18K with counterfactuals) ## Dataset Structure ``` counter-hate-dataset/ ├── data/ │ ├── train.csv # 12,597 training samples (includes counterfactuals) │ ├── validation.csv # 2,700 validation samples (originals only) │ ├── test.csv # 2,703 test samples (originals only) │ └── dataset.csv # Combined all samples ├── images/ │ ├── hate/ # 9,000 images for hate speech samples │ └── non_hate/ # 9,000 images for non-hate samples └── README.md # This file ``` ## Features Each row in the CSV files contains: | Column | Type | Description | |--------|------|-------------| | `original_sample_id` | str | Unique identifier for the original sample (e.g., HS_HATE_RACE_0001) | | `counterfactual_id` | str | Unique ID for this variant (original or counterfactual_X) | | `text` | str | The actual text content | | `class_label` | str | One of 8 hate speech categories | | `target_group` | str | Protected group mentioned in text (8 groups) | | `polarity` | str | 'hate', 'non-hate' | | `hate_score` | float | Numeric hate score from annotations | | `confidence` | float | Confidence in the annotation (0-10 scale) | | `cf_type` | str | 'original' or 'counterfactual_1', 'counterfactual_2', etc. | | `t2i_prompt` | str | Text-to-image generation prompt | | `image_path` | str | Relative path to corresponding PNG image | | `split` | str | 'train', 'validation', or 'test' | ## Splits Training conditions and split statistics: ### Condition nCF (No Counterfactual) - Train: 4,158 original samples - Validation: 891 original samples - Test: 892 original samples - **Total: 5,841 samples** ### Condition CF (With Counterfactual) - Train: 12,597 samples (includes counterfactuals) - Validation: 2,700 samples (originals only) - Test: 2,703 samples (originals only) - **Total: 18,000 samples** All splits are stratified by `class_label` to preserve class distribution. ## Class Distribution Each class contains 750 original samples: - **Hate Speech (4 classes)**: - hate_race: 750 - hate_religion: 750 - hate_gender: 750 - hate_other: 750 - **Non-Hate (4 classes)**: - offensive_non_hate: 750 - neutral_discussion: 750 - counter_speech: 750 - ambiguous: 750 ## Image Generation Images were generated using the **Z-Image-Turbo** text-to-image model with prompts derived from the text content. Each image corresponds to a text sample and represents the hate speech category and context. Images are stored as PNG files organized by category. ### Image Organization ``` images/ ├── hate/ │ ├── Hate_Gender/ # Gender-targeted hate speech │ ├── Hate_Others/ # Other hate categories │ ├── Hate_race/ # Race-targeted hate speech │ └── ... └── non_hate/ ├── generated_images-ambigious/ ├── generated_images-counter-speech/ ├── generated_images-neutral/ └── ... ``` ## Usage ### Load with Hugging Face Datasets ```python from datasets import load_dataset # Load the full dataset dataset = load_dataset('vs16/counter-hate-dataset') # Access specific split train_data = dataset['train'] val_data = dataset['validation'] test_data = dataset['test'] # Access a sample sample = train_data[0] print(sample['text']) print(sample['image_path']) ``` ### Load with Pandas ```python import pandas as pd from PIL import Image # Load a specific split train_df = pd.read_csv('data/train.csv') # Access a row sample = train_df.iloc[0] print(sample['text']) print(sample['class_label']) # Load the image image = Image.open(f"images/{sample['image_path']}") image.show() ``` ### Stratified Train/Val/Test Split ```python import pandas as pd from sklearn.model_selection import train_test_split df = pd.read_csv('data/dataset.csv') # Using the provided split column train_df = df[df['split'] == 'train'] val_df = df[df['split'] == 'validation'] test_df = df[df['split'] == 'test'] ``` ## Experimental Methodology ### Counterfactual Generation Counterfactual samples were generated via **identity-term substitution**: 1. Extract identity terms (group descriptors) from original text 2. Replace with alternative terms for the same attribute dimension 3. Examples: - "Muslim" → "Christian", "Jewish", "Hindu", etc. - "Black" → "Asian", "Hispanic", "Native American", etc. - "woman" → "man", etc. ### Data Preparation (Conditions) - **nCF (No Counterfactual)**: Only original 6,000 samples - **CF (With Counterfactual)**: 6,000 originals + 12,000 counterfactual variants ### Fair Evaluation Protocol - Validation and test sets contain **only original samples** for both conditions - Training set includes counterfactuals in CF condition - Prevents data leakage and ensures comparable evaluation ## Citation If you use this dataset in your research, please cite: ```bibtex @dataset{vs16_counter_hate_2026, title={Counter-Hate Dataset: A Multimodal Benchmark for Studying Fairness and Bias in Hate Speech Detection}, year={2026}, publisher={Hugging Face Datasets}, url={https://huggingface.co/datasets/vs16/counter-hate-dataset} } ``` And the original work: ```bibtex @article{kennedy2020measuring, title={Measuring the Reliability of Hate Speech Annotations: The Case of the European Parliament Debates}, author={Kennedy, Bing and Atkinson, David and others}, year={2020} } ``` ## License The dataset is provided for **research purposes only**. Users must comply with applicable laws and ethical guidelines when working with this data. ## Ethical Considerations ⚠️ **Important Note**: This dataset contains hate speech and offensive language for research purposes only. ### Responsible Use Guidelines 1. **Research Purpose Only**: Use solely for studying bias, fairness, and detection systems 2. **Do Not Amplify**: Do not use to train systems that amplify or spread hate speech 3. **Sensitivity**: Be aware of the sensitive nature of the content 4. **Attribution**: Always cite and credit the dataset source 5. **Report Issues**: Report any misuse or ethical concerns 6. **Institutional Review**: Consider IRB approval for related human studies ### Protected Groups Representation The dataset explicitly includes diverse protected groups to ensure comprehensive bias evaluation. This is intentional and necessary for fairness research. ## Dataset Statistics - **Total samples**: 18,000 text-image pairs - **Original samples**: 6,000 - **Counterfactual variants**: 12,000 - **Image files**: 18,000 PNG images - **Total size**: ~2.3 GB (with images) - **CSV size**: ~8 MB (combined) - **Average text length**: 150-300 characters - **Image resolution**: 512x512px (typical for T2I models) ## Reproducibility All splits and train/val/test assignments are deterministic and reproducible: - **Random seed**: 42 - **Stratification**: By class_label (8 classes) - **Split level**: By original_sample_id (group-level splitting) - **Canonical source**: `canonical_splits.json` defines all splits ## Known Limitations 1. Images are synthetically generated and may not perfectly represent real-world scenarios 2. Counterfactual generation via term substitution is limited to identity/demographic attributes 3. Limited to English text 4. Hate speech prevalence is intentionally high (for detection research) 5. May not generalize to all hate speech types or contexts ## Questions & Support For issues, questions, or suggestions: - 🐛 Dataset issues: Report on GitHub - 📧 Contact: via Hugging Face dataset page - 💬 Discussions: Use the Hugging Face Discussions tab ## Related Work This dataset builds on and relates to: - UCBerkeley-DLab's "Measuring Hate Speech" dataset - HateBERT and other hate speech detection models - Fairness and bias research in NLP - Counterfactual data augmentation literature --- **Dataset Version**: 1.0 **Last Updated**: April 2, 2026 **Status**: Ready for research use

提供机构：

vs16

5,000+

优质数据集

54 个

任务类型

进入经典数据集