sophieb/dynamically_generated_hate_speech_dataset
收藏Hugging Face2022-06-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/sophieb/dynamically_generated_hate_speech_dataset
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset card for dynamically generated dataset hate speech detection
## Dataset summary
This dataset that was dynamically generated for training and improving hate speech detection models. A group of trained annotators generated and labeled challenging examples so that hate speech models could be tricked and consequently improved. This dataset contains about 40,000 examples of which 54% are labeled as hate speech. It also provides the target of hate speech, including vulnerable, marginalized, and discriminated groups. Overall, this is a balanced dataset which makes it different from the already available hate speech datasets you can find on the web.
This dataset was presented in the article [Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection published](https://aclanthology.org/2021.acl-long.132.pdf) in 2021. The article describes the process for generating and annotating the data. Also, it describes how they used the generated data for training and improving hate speech detection models. The full author list is the following: Bertie Vidgen (The Alan Turing Institute), Tristan Thrush (Facebook), Zeerak Waseem (University of Sheffield), and Douwe Kiela (Facebook).
提供机构:
sophieb
原始信息汇总
数据集概述
数据集名称
动态生成数据集用于仇恨言论检测
数据集目的
用于训练和改进仇恨言论检测模型。
数据集内容
- 包含约40,000个示例。
- 其中54%的示例被标记为仇恨言论。
- 提供了仇恨言论的目标,包括弱势、边缘化和受歧视的群体。
数据集特点
- 是一个平衡的数据集,与网络上的其他仇恨言论数据集不同。
相关文献
数据集在2021年的文章《Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection》中被介绍,详细描述了数据生成和标注的过程,以及如何使用这些数据来训练和改进仇恨言论检测模型。
作者
- Bertie Vidgen (The Alan Turing Institute)
- Tristan Thrush (Facebook)
- Zeerak Waseem (University of Sheffield)
- Douwe Kiela (Facebook)



