sophieb/dynamically_generated_hate_speech_dataset

Name: sophieb/dynamically_generated_hate_speech_dataset
Creator: sophieb
Published: 2022-06-25 18:02:18
License: 暂无描述

Hugging Face2022-06-25 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/sophieb/dynamically_generated_hate_speech_dataset

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset card for dynamically generated dataset hate speech detection ## Dataset summary This dataset that was dynamically generated for training and improving hate speech detection models. A group of trained annotators generated and labeled challenging examples so that hate speech models could be tricked and consequently improved. This dataset contains about 40,000 examples of which 54% are labeled as hate speech. It also provides the target of hate speech, including vulnerable, marginalized, and discriminated groups. Overall, this is a balanced dataset which makes it different from the already available hate speech datasets you can find on the web. This dataset was presented in the article [Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection published](https://aclanthology.org/2021.acl-long.132.pdf) in 2021. The article describes the process for generating and annotating the data. Also, it describes how they used the generated data for training and improving hate speech detection models. The full author list is the following: Bertie Vidgen (The Alan Turing Institute), Tristan Thrush (Facebook), Zeerak Waseem (University of Sheffield), and Douwe Kiela (Facebook).

提供机构：

sophieb

原始信息汇总

数据集概述

数据集名称

动态生成数据集用于仇恨言论检测

数据集目的

用于训练和改进仇恨言论检测模型。

数据集内容

包含约40,000个示例。
其中54%的示例被标记为仇恨言论。
提供了仇恨言论的目标，包括弱势、边缘化和受歧视的群体。

数据集特点

是一个平衡的数据集，与网络上的其他仇恨言论数据集不同。

作者

Bertie Vidgen (The Alan Turing Institute)
Tristan Thrush (Facebook)
Zeerak Waseem (University of Sheffield)
Douwe Kiela (Facebook)

5,000+

优质数据集

54 个

任务类型

进入经典数据集