Bn-HIB: A Benchmark Bengali Multimodal Dataset for Detecting Hate Speech and Inflammatory Content in Memes

Name: Bn-HIB: A Benchmark Bengali Multimodal Dataset for Detecting Hate Speech and Inflammatory Content in Memes
Creator: Mendeley Data
Published: 2026-04-15 13:12:19
License: 暂无描述

DataCite Commons2026-04-15 更新2026-05-04 收录

下载链接：

https://data.mendeley.com/datasets/9vg79v65nr

下载链接

链接失效反馈

官方服务：

资源简介：

Dataset Overview: The Bn-HIB (Bangla Hate–Inflammatory–Benign) dataset is a novel multimodal resource developed for detecting harmful content in Bengali memes. It contains 3,247 manually annotated memes and is the first dataset to explicitly differentiate inflammatory content from direct hate speech in the Bengali language. Data Splits: The dataset is divided into three standard subsets: Training set (70%): 2,272 instances Validation set (15%): 487 instances Test set (15%): 488 instances | Class | Training | Validation | Testing | Total | | ----------------- | --------- | ---------- | ------- | --------- | | Hate (HM) | 811 | 174 | 173 | 1,158 | | Inflammatory (IM) | 773 | 166 | 167 | 1,106 | | Benign (BM) | 688 | 147 | 148 | 983 | | Total | 2,272 | 487 | 488 | 3,247 | Key Characteristics Multimodal Content: Each instance consists of both image and embedded text. Language Variety: Includes standard Bengali, Bengali-English code-mixed, and code-switched memes. Annotation Process: Annotated by three fluent Bengali speakers. A structured decision tree was used to ensure consistency. Achieved a Fleiss’ kappa score of 0.79, indicating substantial agreement. Data Source: Collected from 25 public Facebook groups and pages with high meme activity. Text Extraction: Text within images was extracted using the Gemini API and manually verified for script accuracy. Significance This dataset provides a valuable benchmark for research in multimodal hate speech detection, particularly for low-resource languages like Bengali. Its distinction between hate and inflammatory content enables more nuanced modelling and analysis of harmful online behaviour.

提供机构：

Mendeley Data

创建时间：

2026-04-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集