A Multimodal Bangla Meme Dataset for Hate Speech, Sentiment, and Sarcasm Detection with Text–Image Fusion and Lexicon Annotations

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/d6t8nbkj96

下载链接

链接失效反馈

官方服务：

资源简介：

Bangla Multimodal Meme Dataset for Hate Speech, Sarcasm, and Offensive Content Detection This dataset consists of 5,126 Bangla memes annotated for multiple offensive and contextual attributes including hate speech, sarcasm, vulgarity, violence, humor, and category. The dataset is intended to support multimodal NLP research by combining OCR-extracted Bangla text, image metadata, perceptual image fingerprints (pHash), and lexicon-based linguistic features. Due to copyright restrictions, the original meme images are not distributed. Instead, the dataset provides: OCR-extracted Bangla text from each meme English translations Perceptual hash (pHash) as a unique image fingerprint Image metadata (width and height) Manual annotations for hate speech, sarcasm, vulgarity, violence, humor, and category A curated Bangla offensive lexicon for auxiliary feature extraction Researchers can retrieve the original memes using the OCR text via web search and verify exact matches using the provided pHash values. This ensures reproducibility while complying with copyright-safe dataset release practices. The dataset was annotated by three independent annotators following a shared guideline. Annotation reliability was assessed on a stratified subset of 400 memes using Fleiss’ kappa, demonstrating substantial to near-perfect agreement across labels. Additionally, the dataset includes a labeled Bangla offensive lexicon containing 441 terms categorized into vulgar, insult, violent, and hate-associated words. These lexicon features provide complementary linguistic signals for multimodal fusion experiments. This dataset is suitable for research in: Hate speech detection in Bangla memes Sarcasm and humor analysis Offensive language detection Multimodal text–image fusion models Low-resource Bangla NLP research The dataset is released for research and academic use only.

创建时间：

2026-02-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集