Shubhi324/measuring-hate-speech
收藏Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Shubhi324/measuring-hate-speech
下载链接
链接失效反馈官方服务:
资源简介:
这是一个公开的数据集,用于测量仇恨言论,包含39,565条评论,由7,912名注释者标注,共135,556行数据。主要结果变量是“仇恨言论分数”,但10个构成有序标签(情感、尊重、侮辱、羞辱、低等地位、暴力、非人化、种族灭绝、攻击/防御、仇恨言论基准)也可以作为结果变量。数据集包含8个目标身份群体(种族/民族、宗教、国籍/公民身份、性别、性取向、年龄、残疾、政治意识形态)和42个子群体,以及6个注释者人口统计信息和40个子组。仇恨言论分数通过IRT调整考虑了注释者对标签指南的不同理解。数据集的关键列包括仇恨言论分数、文本、评论ID、注释者ID等。
This is a public release of the dataset described in Kennedy et al. (2020) and Sachdeva et al. (2022), consisting of 39,565 comments annotated by 7,912 annotators, for 135,556 combined rows. The primary outcome variable is the "hate speech score" but the 10 constituent ordinal labels (sentiment, (dis)respect, insult, humiliation, inferior status, violence, dehumanization, genocide, attack/defense, hate speech benchmark) can also be treated as outcomes. Includes 8 target identity groups (race/ethnicity, religion, national origin/citizenship, gender, sexual orientation, age, disability, political ideology) and 42 target identity subgroups, as well as 6 annotator demographics and 40 subgroups. The hate speech score incorporates an IRT adjustment by estimating variation in annotator interpretation of the labeling guidelines.
提供机构:
Shubhi324



