five

Multiclass English Hate Speech Dataset

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/wfsyh6jx3y
下载链接
链接失效反馈
官方服务:
资源简介:
Multiclass English Hate Speech Dataset is an extended and fine-grained version of the original binary-labelled hate-speech dataset released as part of the TweetEval benchmark (Hate sub-task). While the original dataset contained English posts annotated only as Hate or Non-Hate, this work substantially enhances it by applying a detailed manual re-annotation process to create multiple specific hate-speech categories. This provides richer granularity and enables more accurate modelling of real-world online hate. All posts were manually reviewed and reclassified by trained annotators following a structured annotation guideline. The dataset introduces a comprehensive multiclass taxonomy capturing different forms of explicit and implicit hate, such as: Gender-Based Hate Speech (Misogyny) Gender-Based Hate Speech (Misandry) Immigration & Xenophobic Hate Speech (Anti-Immigrant) Immigration & Xenophobic Hate Speech (Anti-Refugee) Immigration & Xenophobic Hate Speech (Xenophobia) Through this re-annotation effort, the dataset transforms a simple binary classification problem into a 14-class fine-grained hate-speech categorization task, enabling more robust research on model sensitivity, bias analysis, safety evaluation, and explainability. The dataset is suitable for: Content-moderation research and safety evaluation Sociolinguistic analysis of targeted abuse All user identifiers and personally identifiable information (PII) have been removed or masked to ensure privacy and ethical compliance. The dataset includes the anonymized text, the newly assigned multiclass label, and mapping metadata to the original TweetEval record. This resource aims to support researchers, practitioners, and policymakers in building safer and more responsible AI systems capable of detecting nuanced forms of online hate and targeted harassment.
创建时间:
2025-11-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作