Implicit Hate
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/gt-salt/implicit-hate
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是Implicit Hate数据集的一个子集,包含了隐性仇恨言论和中性言论的例子,旨在分析这些言论对边缘化群体的代表性伤害。为了进行分析,该数据集被降采样,以确保有害言论和无害言论的例子数量相等。该任务的目的是利用安全得分来衡量语言模型中的代表性伤害。
This dataset is a subset of the Implicit Hate dataset, containing examples of implicit hate speech and neutral speech. It aims to analyze the representational harm of such speech towards marginalized groups. For analytical purposes, this dataset has been downsampled to ensure an equal number of examples between harmful and non-harmful speech. The objective of this task is to use safety scores to measure representational harm in language models.



