Multiclass English Hate Speech Dataset
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/wfsyh6jx3y
下载链接
链接失效反馈官方服务:
资源简介:
Multiclass English Hate Speech Dataset is an extended and fine-grained version of the original binary-labelled hate-speech dataset released as part of the TweetEval benchmark (Hate sub-task). While the original dataset contained English posts annotated only as Hate or Non-Hate, this work substantially enhances it by applying a detailed manual re-annotation process to create multiple specific hate-speech categories. This provides richer granularity and enables more accurate modelling of real-world online hate.
All posts were manually reviewed and reclassified by trained annotators following a structured annotation guideline. The dataset introduces a comprehensive multiclass taxonomy capturing different forms of explicit and implicit hate, such as:
Gender-Based Hate Speech (Misogyny)
Gender-Based Hate Speech (Misandry)
Immigration & Xenophobic Hate Speech (Anti-Immigrant)
Immigration & Xenophobic Hate Speech (Anti-Refugee)
Immigration & Xenophobic Hate Speech (Xenophobia)
Through this re-annotation effort, the dataset transforms a simple binary classification problem into a 14-class fine-grained hate-speech categorization task, enabling more robust research on model sensitivity, bias analysis, safety evaluation, and explainability.
The dataset is suitable for:
Content-moderation research and safety evaluation
Sociolinguistic analysis of targeted abuse
All user identifiers and personally identifiable information (PII) have been removed or masked to ensure privacy and ethical compliance. The dataset includes the anonymized text, the newly assigned multiclass label, and mapping metadata to the original TweetEval record.
This resource aims to support researchers, practitioners, and policymakers in building safer and more responsible AI systems capable of detecting nuanced forms of online hate and targeted harassment.
创建时间:
2025-11-24



