LennardZuendorf/Dynamically-Generated-Hate-Speech-Dataset

Name: LennardZuendorf/Dynamically-Generated-Hate-Speech-Dataset
Creator: LennardZuendorf
Published: 2023-05-16 16:01:46
License: 暂无描述

Hugging Face2023-05-16 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/LennardZuendorf/Dynamically-Generated-Hate-Speech-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - text-classification - text-generation language: - en tags: - not-for-all-audiences - legal pretty_name: dynamically generated hate speech dataset --- # Dataset Card for dynamically generated hate speech dataset ## Dataset Description - **Homepage:** [GitHub](https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset) - **Point of Contact:** [bertievidgen@gmail.com](mailto:bertievidgen@gmail.com) ### Dataset Summary This is a copy of the Dynamically-Generated-Hate-Speech-Dataset, presented in [this paper](https://arxiv.org/abs/2012.15761) by - **Bertie Vidgen**, **Tristan Thrush**, **Zeerak Waseem** and **Douwe Kiela** ## Original README from [GitHub](https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset/blob/main/README.md) ## Dynamically-Generated-Hate-Speech-Dataset ReadMe for v0.2 of the Dynamically Generated Hate Speech Dataset from Vidgen et al. (2021). If you use the dataset, please cite our paper in the Proceedings of ACL 2021, and available on [Arxiv](https://arxiv.org/abs/2012.15761). Contact Dr. Bertie Vidgen if you have feedback or queries: bertievidgen@gmail.com. The full author list is: Bertie Vidgen (The Alan Turing Institute), Tristan Thrush (Facebook AI Research), Zeerak Waseem (University of Sheffield) and Douwe Kiela (Facebook AI Research). This paper is an output of the Dynabench project: https://dynabench.org/tasks/5#overall ### Dataset descriptions v0.2.2.csv is the full dataset used in our ACL paper. v0.2.3.csv removes duplicate entries, all of which occurred in round 1. Duplicates come from two sources: (1) annotators entering the same content multiple times and (2) different annotators entering the same content. The duplicates are interesting for understanding the annotation process, and the challenges of dynamically generating datasets. However, they are likely to be less useful for training classifiers and so are removed in v0.2.3. We did not lower case the text before removing duplicates as capitalisations contain potentially useful signals. ### Overview The Dynamically Generated Hate Speech Dataset is provided in one table. 'acl.id' is the unique ID of the entry. 'Text' is the content which has been entered. All content is synthetic. 'Label' is a binary variable, indicating whether or not the content has been identified as hateful. It takes two values: hate, nothate. 'Type' is a categorical variable, providing a secondary label for hateful content. For hate it can take five values: Animosity, Derogation, Dehumanization, Threatening and Support for Hateful Entities. Please see the paper for more detail. For nothate the 'type' is 'none'. In round 1 the 'type' was not given and is marked as 'notgiven'. 'Target' is a categorical variable, providing the group that is attacked by the hate. It can include intersectional characteristics and multiple groups can be identified. For nothate the type is 'none'. Note that in round 1 the 'target' was not given and is marked as 'notgiven'. 'Level' reports whether the entry is original content or a perturbation. 'Round' is a categorical variable. It gives the round of data entry (1, 2, 3 or 4) with a letter for whether the entry is original content ('a') or a perturbation ('b'). Perturbations were not made for round 1. 'Round.base' is a categorical variable. It gives the round of data entry, indicated with just a number (1, 2, 3 or 4). 'Split' is a categorical variable. it gives the data split that the entry has been assigned to. This can take the values 'train', 'dev' and 'test'. The choice of splits is explained in the paper. 'Annotator' is a categorical variable. It gives the annotator who entered the content. Annotator IDs are random alphanumeric strings. There are 20 annotators in the dataset. 'acl.id.matched' is the ID of the matched entry, connecting the original (given in 'acl.id') and the perturbed version. For identities (recorded under 'Target') we use shorthand labels to constructed the dataset, which can be converted (and grouped) as follows: none -> for non hateful entries NoTargetRecorded -> for hateful entries with no target recorded mixed -> Mixed race background ethnic minority -> Ethnic Minorities indig -> Indigenous people indigwom -> Indigenous Women non-white -> Non-whites (attacked as 'non-whites', rather than specific non-white groups which are generally addressed separately) trav -> Travellers (including Roma, gypsies) bla -> Black people blawom -> Black women blaman -> Black men african -> African (all 'African' attacks will also be an attack against Black people) jew -> Jewish people mus -> Muslims muswom -> Muslim women wom -> Women trans -> Trans people gendermin -> Gender minorities, bis -> Bisexual gay -> Gay people (both men and women) gayman -> Gay men gaywom -> Lesbians dis -> People with disabilities working -> Working class people old -> Elderly people asi -> Asians asiwom -> Asian women east -> East Asians south -> South Asians (e.g. Indians) chinese -> Chinese people pak -> Pakistanis arab -> Arabs, including people from the Middle East immig -> Immigrants asylum -> Asylum seekers ref -> Refguees for -> Foreigners eastern european -> Eastern Europeans russian -> Russian people pol -> Polish people hispanic -> Hispanic people, including latinx and Mexicans nazi -> Nazis ('Support' type of hate) hitler -> Hitler ('Support' type of hate) ### Code Code was implemented using hugging face transformers library. ## Additional Information ### Licensing Information The original repository does not provide any license, but is free for use with proper citation of the original paper in the Proceedings of ACL 2021, available on [Arxiv](https://arxiv.org/abs/2012.15761) ### Citation Information cite as [arXiv:2012.15761](https://arxiv.org/abs/2012.15761) or [https://doi.org/10.48550/arXiv.2012.15761](https://[doi.org/10.48550/arXiv.2012.15761)

提供机构：

LennardZuendorf

原始信息汇总

数据集卡片 for Dynamically Generated Hate Speech Dataset

数据集描述

数据集摘要

Dynamically Generated Hate Speech Dataset 是由以下作者在论文中提出的：

Bertie Vidgen
Tristan Thrush
Zeerak Waseem
Douwe Kiela

数据集描述

版本: v0.2.2.csv 和 v0.2.3.csv
数据集结构:
- acl.id: 条目的唯一ID。
- Text: 输入的内容，所有内容都是合成的。
- Label: 二元变量，表示内容是否被识别为仇恨言论，取值为 hate 或 nothate。
- Type: 分类变量，为仇恨内容提供次级标签。对于 hate，可以取五个值：Animosity, Derogation, Dehumanization, Threatening 和 Support for Hateful Entities。对于 nothate，类型为 none。在第一轮中，类型未给出，标记为 notgiven。
- Target: 分类变量，提供仇恨攻击的群体。可以包括交叉特征，多个群体可以被识别。对于 nothate，类型为 none。在第一轮中，目标未给出，标记为 notgiven。
- Level: 报告条目是原始内容还是扰动。
- Round: 分类变量，给出数据输入的轮次（1, 2, 3 或 4），以及条目是原始内容（a）还是扰动（b）。第一轮没有扰动。
- Round.base: 分类变量，仅用数字表示数据输入的轮次（1, 2, 3 或 4）。
- Split: 分类变量，给出条目被分配到的数据分割，取值为 train, dev 和 test。
- Annotator: 分类变量，给出输入内容的标注者。标注者ID是随机的字母数字字符串。数据集中有20个标注者。
- acl.id.matched: 匹配条目的ID，连接原始条目和扰动版本。

身份标签

none: 非仇恨条目
NoTargetRecorded: 仇恨条目，但没有记录目标
mixed: 混合种族背景
ethnic minority: 少数民族
indig: 原住民
indigwom: 原住民女性
non-white: 非白人
trav: 旅行者（包括罗姆人、吉普赛人）
bla: 黑人
blawom: 黑人女性
blaman: 黑人男性
african: 非洲人
jew: 犹太人
mus: 穆斯林
muswom: 穆斯林女性
wom: 女性
trans: 跨性别者
gendermin: 性别少数群体
bis: 双性恋
gay: 同性恋（男性和女性）
gayman: 同性恋男性
gaywom: 女同性恋
dis: 残疾人
working: 工人阶级
old: 老年人
asi: 亚洲人
asiwom: 亚洲女性
east: 东亚人
south: 南亚人（例如印度人）
chinese: 中国人
pak: 巴基斯坦人
arab: 阿拉伯人，包括中东人
immig: 移民
asylum: 寻求庇护者
ref: 难民
for: 外国人
eastern european: 东欧人
russian: 俄罗斯人
pol: 波兰人
hispanic: 西班牙裔人，包括拉丁裔和墨西哥人
nazi: 纳粹（支持类型的仇恨）
hitler: 希特勒（支持类型的仇恨）

代码

代码使用 hugging face transformers 库实现。

附加信息

许可信息

原始仓库未提供任何许可证，但可免费使用，需正确引用原始论文。

引用信息

引用为 arXiv:2012.15761 或 https://doi.org/10.48550/arXiv.2012.15761

搜集汇总

数据集介绍

构建方式

在仇恨言论检测领域，动态生成仇恨言论数据集采用了一种创新的构建方法。该数据集通过多轮动态生成过程，邀请二十位标注者参与内容创作与标注，每一轮包含原始内容生成及其扰动版本。数据生成过程分为四个主要轮次，其中第一轮专注于原始内容，后续轮次则引入扰动策略以增强数据多样性。为确保数据质量，重复条目在最终版本中被移除，同时保留了大小写等潜在语义信号。这种动态生成机制旨在模拟仇恨言论的演变过程，为模型训练提供了更为复杂和现实的挑战。

特点

该数据集在仇恨言论研究领域展现出独特的特点。其核心在于提供了丰富的标注维度，包括二元仇恨标签、仇恨类型分类、攻击目标群体以及数据生成轮次等。仇恨类型细分为敌意、贬低、非人化、威胁和支持仇恨实体五类，攻击目标则覆盖了种族、宗教、性别、性取向等多个敏感维度，并支持交叉性特征的记录。数据集通过引入扰动策略，增强了内容的多样性和复杂性，有助于提升模型的泛化能力。此外，数据已预先划分为训练集、开发集和测试集，为研究者提供了便捷的评估框架。

使用方法

该数据集适用于文本分类与生成任务，尤其在仇恨言论检测模型中具有重要应用价值。研究者可直接加载数据集文件，利用其提供的文本内容及多维度标签进行模型训练与评估。在预处理阶段，建议关注文本的大小写信息，因其可能包含语义信号；同时，可根据研究需求选择使用完整版本或去除重复条目的优化版本。数据集的划分已明确标注，便于进行标准的机器学习流程。使用时应遵循学术规范，引用原始论文，并注意数据内容的敏感性，确保符合伦理与法律要求。

背景与挑战

背景概述

在自然语言处理领域，仇恨言论的自动检测是保障网络环境安全与健康的关键课题。由Bertie Vidgen、Tristan Thrush、Zeerak Waseem和Douwe Kiela等研究人员于2021年创建的Dynamically-Generated-Hate-Speech-Dataset，作为Dynabench项目的一部分，旨在通过动态生成方法应对仇恨言论的演化性与多样性。该数据集聚焦于文本分类任务，通过合成内容模拟真实场景中的仇恨表达，其核心研究问题在于提升模型对复杂、隐晦仇恨言论的识别能力，对社交媒体内容审核与伦理人工智能发展产生了深远影响。

当前挑战

该数据集致力于解决仇恨言论检测中因语言动态性与语境依赖性带来的挑战，包括对隐晦、讽刺或新兴表达形式的准确分类。在构建过程中，研究人员面临合成内容生成与标注的一致性难题，例如重复条目处理与多轮标注中的目标群体识别复杂性。此外，确保数据在种族、性别等敏感维度上的平衡性与代表性，同时避免标注偏见，构成了数据集构建的另一重挑战。

常用场景

经典使用场景

在仇恨言论检测领域，该数据集通过动态生成机制，为研究者提供了一个高度多样化的文本分类基准。其核心价值在于模拟真实网络环境中仇恨言论的演变过程，涵盖了从原始内容到对抗性扰动的完整链条。经典使用场景包括训练和评估仇恨言论分类模型，特别是针对多类别仇恨类型（如敌意、贬低、非人化等）的细粒度识别。数据集的分割设计（训练、开发、测试）支持标准的机器学习流程，使得模型能够在合成数据上学习泛化能力，进而应对复杂的社会语言现象。

衍生相关工作

基于该数据集，学术界衍生了一系列经典研究工作，主要集中在仇恨言论检测模型的创新与评估上。例如，研究者利用其动态生成特性开发了对抗性训练方法，以提升模型对扰动样本的鲁棒性。细粒度的标签体系催生了多任务学习框架，同时预测仇恨类别和目标群体。数据集还促进了跨数据集比较研究，验证了模型在不同语言和文化背景下的泛化性能。此外，围绕数据集的伦理讨论推动了关于标注偏差、算法公平性和透明人工智能的学术对话，丰富了仇恨言论治理的理论与实践。

数据集最近研究