five

A Multimodal Bangla Meme Dataset for Hate Speech, Sentiment, and Sarcasm Detection with Text–Image Fusion and Lexicon Annotations

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/d6t8nbkj96
下载链接
链接失效反馈
官方服务:
资源简介:
Bangla Multimodal Meme Dataset for Hate Speech, Sarcasm, and Offensive Content Detection This dataset consists of 5,126 Bangla memes annotated for multiple offensive and contextual attributes including hate speech, sarcasm, vulgarity, violence, humor, and category. The dataset is intended to support multimodal NLP research by combining OCR-extracted Bangla text, image metadata, perceptual image fingerprints (pHash), and lexicon-based linguistic features. Due to copyright restrictions, the original meme images are not distributed. Instead, the dataset provides: OCR-extracted Bangla text from each meme English translations Perceptual hash (pHash) as a unique image fingerprint Image metadata (width and height) Manual annotations for hate speech, sarcasm, vulgarity, violence, humor, and category A curated Bangla offensive lexicon for auxiliary feature extraction Researchers can retrieve the original memes using the OCR text via web search and verify exact matches using the provided pHash values. This ensures reproducibility while complying with copyright-safe dataset release practices. The dataset was annotated by three independent annotators following a shared guideline. Annotation reliability was assessed on a stratified subset of 400 memes using Fleiss’ kappa, demonstrating substantial to near-perfect agreement across labels. Additionally, the dataset includes a labeled Bangla offensive lexicon containing 441 terms categorized into vulgar, insult, violent, and hate-associated words. These lexicon features provide complementary linguistic signals for multimodal fusion experiments. This dataset is suitable for research in: Hate speech detection in Bangla memes Sarcasm and humor analysis Offensive language detection Multimodal text–image fusion models Low-resource Bangla NLP research The dataset is released for research and academic use only.
创建时间:
2026-02-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作