Hate Speech and Offensive Language
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/t-davidson/hate-speech-and-offensive-language
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了一系列推文,其中包含了仇恨言论,包括种族歧视、性别歧视、恐同以及攻击性表达。这些数据的特点是分布极为不平衡,仇恨言论与非仇恨言论的比例大约为1:15。总体而言,该数据集共包含24,783个样本,其中仇恨言论样本有1,430个,非仇恨言论样本有23,353个。该数据集的任务是进行仇恨言论检测。
This dataset consists of a collection of tweets containing hate speech, including racial discrimination, sexism, homophobia, and aggressive expressions. The dataset features an extremely imbalanced distribution, with the ratio of hate speech samples to non-hate speech samples standing at approximately 1:15. In total, this dataset encompasses 24,783 samples, of which 1,430 are hate speech samples and 23,353 are non-hate speech samples. The task associated with this dataset is hate speech detection.
提供机构:
Davidson et al.



