动态生成的仇恨言论数据集

Name: 动态生成的仇恨言论数据集
Creator: 艾伦图灵研究所
Published: 2021-06-03 16:05:32
License: 暂无描述

arXiv2021-06-03 更新2024-06-21 收录

下载链接：

https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

动态生成的仇恨言论数据集是由艾伦图灵研究所的研究人员创建的，旨在通过动态数据生成过程提高在线仇恨检测模型的性能和鲁棒性。该数据集包含约40,000条记录，由经过培训的标注者在四轮动态数据创建过程中生成和标注。数据集中的每条仇恨言论记录都有精细的标签，指明仇恨的类型和目标。仇恨言论占数据集的54%，远高于其他可比数据集。通过这种动态生成的方法，模型在测试集上的表现显著提高，且更难以被标注者欺骗。此外，这些模型在HATECHECK功能测试套件上的表现也有所提升，显示出更好的泛化能力。

The dynamically generated hate speech dataset was created by researchers at the Alan Turing Institute, aiming to improve the performance and robustness of online hate detection models through a dynamic data generation process. This dataset contains approximately 40,000 records, which were generated and annotated by trained annotators across four rounds of dynamic data creation procedures. Each hate speech record in the dataset carries fine-grained labels specifying the type and target of the hate speech. Hate speech accounts for 54% of the dataset, a proportion significantly higher than that of other comparable datasets. With this dynamic generation approach, models exhibit significantly improved performance on test sets and are more difficult to be deceived by annotators. Additionally, these models also show enhanced performance on the HATECHECK functional test suite, demonstrating better generalization capabilities.

提供机构：

艾伦图灵研究所

创建时间：

2021-01-01

搜集汇总

数据集介绍