Multimodal Hate Speech Detection: A Novel Deep Learning Framework for Multilingual Text and Images
收藏Figshare2025-02-04 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_b_Multimodal_Multilingual_Hate_Speech11K_MMHS11K_Dataset_b_/27310764
下载链接
链接失效反馈官方服务:
资源简介:
Furqan Khan Saddozai1, Sahar K Badri2, Daniyal Alghazzawi2, Asad Khattak3, and Muhammad Zubair Asghar1*Gomal Research Institute of Computing (GRIC), Faculty of Computing, Gomal University, D.I.Khan (KP), PakistanInformation Systems Department, Faculty of Computing and Information Technology, King Abdulaziz University, 21589, Jeddah, Saudi ArabiaCollege of Technological Innovation, Zayed University, Abu Dhabi Campus, 144534,Abu Dhabi, UAEDataset DescriptionThe Multimodal Multilingual Hate Speech 11K (MMHS11K) Dataset consists of 11,000 labeled Urdu Tweets with English translation, each categorized as either 'Hate' and 'No-Hate'. The dataset is divided into training (MMHS11K_train.xlsx ) and test (MMHS11K_test.xlsx ) files. The training file contains 8,800 records with an equal distribution (i.e. 4400) of 'Hate' and 'No-Hate' classes. The test file contains 2,200 records with 1,100 'Hate' and 1,100 'No-Hate' sample. Each file contains six columns:Tweet_Id: Represents the unique identifier of the corresponding tweet.Text: Contains the textual content of the tweet in Urdu.Text (English): Provides the English translation of the corresponding Urdu tweet.Image_Text: Displays text extracted from images in Urdu. If no text is extracted, the value 'NIL' is used.Image_Text (English): Contains the English translation of the corresponding Image_Text value. If Image_Text is 'NIL', this column also shows 'NIL'.Label: Indicates the classification of the multimodal multilingual tweet as either 'Hate' or 'No-Hate'The dataset contains two folders:MMHS11K_RGB_train.rar: This folder holds the training images in RGB format. It contains two subfolders: Hate and No-Hate. For each tweet in the MMHS11K_train.xlsx file, the corresponding image is stored in the appropriate subfolder (Hate or No-Hate) within the MMHS11K_RGB_train folder, using the tweet's Tweet_Id as the filename. For example, the image for Tweet_Id = 1596192893743796224 is stored in the No-Hate subfolder of MMHS11K_RGB_train, as this tweet is labeled as No-Hate.MMHS11K_RGB_test.rar: This folder contains the test images in RGB format. It also includes two subfolders: Hate and No-Hate.Other InformationAccepted for Publication in: PeerJ Computer Science
创建时间:
2025-02-04



