Twitter Hate Speech Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://sites.google.com/view/icwsm2020datachallenge
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了标注为四种类型的推文:正常、侮辱性、仇恨性和垃圾信息,且各类之间存在明显的数据不平衡问题。在处理数据时,研究者们对表情符号、表情、话题标签和链接进行了处理,以保留有用信息的同时确保数据的清洁。该数据集的任务是进行仇恨言论的识别。
This dataset comprises tweets annotated into four categories: normal, insulting, hateful, and spam, with a significant data imbalance across all categories. During data preprocessing, researchers processed emojis, facial expressions, hashtags, and links to retain valid information while ensuring data cleanliness. The core task of this dataset is hate speech recognition.
提供机构:
Founta et al.



