five

sophieb/dynamically_generated_hate_speech_dataset

收藏
Hugging Face2022-06-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/sophieb/dynamically_generated_hate_speech_dataset
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset card for dynamically generated dataset hate speech detection ## Dataset summary This dataset that was dynamically generated for training and improving hate speech detection models. A group of trained annotators generated and labeled challenging examples so that hate speech models could be tricked and consequently improved. This dataset contains about 40,000 examples of which 54% are labeled as hate speech. It also provides the target of hate speech, including vulnerable, marginalized, and discriminated groups. Overall, this is a balanced dataset which makes it different from the already available hate speech datasets you can find on the web. This dataset was presented in the article [Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection published](https://aclanthology.org/2021.acl-long.132.pdf) in 2021. The article describes the process for generating and annotating the data. Also, it describes how they used the generated data for training and improving hate speech detection models. The full author list is the following: Bertie Vidgen (The Alan Turing Institute), Tristan Thrush (Facebook), Zeerak Waseem (University of Sheffield), and Douwe Kiela (Facebook).
提供机构:
sophieb
原始信息汇总

数据集概述

数据集名称

动态生成数据集用于仇恨言论检测

数据集目的

用于训练和改进仇恨言论检测模型。

数据集内容

  • 包含约40,000个示例。
  • 其中54%的示例被标记为仇恨言论。
  • 提供了仇恨言论的目标,包括弱势、边缘化和受歧视的群体。

数据集特点

  • 是一个平衡的数据集,与网络上的其他仇恨言论数据集不同。

相关文献

数据集在2021年的文章《Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection》中被介绍,详细描述了数据生成和标注的过程,以及如何使用这些数据来训练和改进仇恨言论检测模型。

作者

  • Bertie Vidgen (The Alan Turing Institute)
  • Tristan Thrush (Facebook)
  • Zeerak Waseem (University of Sheffield)
  • Douwe Kiela (Facebook)
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作