aiatums/INDOMEME
收藏Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/aiatums/INDOMEME
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- id
license: cc-by-4.0
task_categories:
- image-classification
- text-classification
tags:
- hate-speech
- meme
- multimodal
- indonesian
- social-media
---
# INDOMEME
INDOMEME is a multimodal dataset of Indonesian memes collected from Facebook, annotated for hate speech detection and content appropriateness classification. Each meme is enriched with OCR-extracted text and LLM-generated captions to support multimodal analysis.
## Dataset Columns
| Column | Description |
|--------|-------------|
| `image` | Meme image |
| `image_path` | Original filename of the meme image |
| `hate_final` | Hatefulness label: `hate` or `not hate` |
| `appropriate_final` | Appropriateness label: `appropriate` or `inappropriate` |
| `topic` | Topical focus in English (e.g., `gender`, `political`, `individual`) |
| `topic_id` | Topical focus in Indonesian |
| `ocr` | Text extracted from the meme image using Qwen2-VL-2B |
| `caption` | Formal image caption generated by Gemini 2.5 Flash describing the visual content and implied message |
### Label Details
**`hate_final`**: A meme is labeled `hate` if it contains insults, harassment, negative stereotypes, or demeaning insinuations toward individuals or groups. Otherwise it is labeled `not hate`.
**`appropriate_final`**: A meme is labeled `inappropriate` if it contains coarse language, sexual references, depictions of violence, or other content unsuitable for public viewing. Otherwise it is labeled `appropriate`. Note that all hateful memes are also labeled as `inappropriate`.
**`topic` / `topic_id`**: Topical focus categories include: Gender, Individual, National Origin/Ethnicity/Race, Political, Religion, Institution/Company, Social Sub-groups, and None/Others. A meme may have multiple topics.
## Citation
If you use this dataset, please cite:
```bibtex
@article{pamungkas2026indomeme,
title = {Decoding hate in memes: multimodal and multitask approaches for low-resource Indonesian social media},
author = {Pamungkas, Endang Wahyu and Wahyuni, Cahyaningtyas Sekar and Amal, Ikhlasul and Purworini, Dian and Rintyarna, Bagus Setya},
journal = {PeerJ Computer Science},
volume = {12},
pages = {e3736},
year = {2026},
doi = {10.7717/peerj-cs.3736}
}
```
---
语言:
- 印尼语(id)
许可协议:CC BY 4.0
任务类别:
- 图像分类
- 文本分类
标签:
- 仇恨言论(hate-speech)
- 表情包(meme)
- 多模态(multimodal)
- 印尼语
- 社交媒体
---
# INDOMEME
INDOMEME是一个源自Facebook的印尼语表情包多模态数据集,经标注用于仇恨言论检测与内容适宜性分类。每个表情包均附带通过光学字符识别(OCR,Optical Character Recognition)提取的文本,以及大语言模型(LLM,Large Language Model)生成的字幕,以支撑多模态分析研究。
## 数据集字段
| 字段 | 描述 |
|--------|-------------|
| `image` | 表情包图像 |
| `image_path` | 表情包图像的原始文件名 |
| `hate_final` | 仇恨性标签:`hate`(仇恨)或`not hate`(非仇恨) |
| `appropriate_final` | 适宜性标签:`appropriate`(适宜)或`inappropriate`(不适宜) |
| `topic` | 英文主题类别(例如`gender`(性别)、`political`(政治)、`individual`(个体)) |
| `topic_id` | 印尼语主题类别 |
| `ocr` | 使用Qwen2-VL-2B模型从表情包图像中提取的文本 |
| `caption` | 由Gemini 2.5 Flash生成的正式图像字幕,用于描述图像视觉内容与隐含信息 |
### 标签说明
**`hate_final`**:若表情包包含针对个人或群体的侮辱、骚扰、负面刻板印象或贬低性暗示,则标注为`hate`(仇恨);否则标注为`not hate`(非仇恨)。
**`appropriate_final`**:若表情包包含粗俗语言、性暗示、暴力描绘或其他不适宜公开展示的内容,则标注为`inappropriate`(不适宜);否则标注为`appropriate`(适宜)。需注意,所有仇恨类表情包均会被同时标注为`inappropriate`(不适宜)。
**`topic` / `topic_id`**:主题类别包括:性别(Gender)、个体(Individual)、国籍/族裔/种族(National Origin/Ethnicity/Race)、政治(Political)、宗教(Religion)、机构/企业(Institution/Company)、社会亚群体(Social Sub-groups)以及无/其他(None/Others)。单个表情包可对应多个主题。
## 引用说明
若使用本数据集,请引用以下文献:
bibtex
@article{pamungkas2026indomeme,
title = {Decoding hate in memes: multimodal and multitask approaches for low-resource Indonesian social media},
author = {Pamungkas, Endang Wahyu and Wahyuni, Cahyaningtyas Sekar and Amal, Ikhlasul and Purworini, Dian and Rintyarna, Bagus Setya},
journal = {PeerJ Computer Science},
volume = {12},
pages = {e3736},
year = {2026},
doi = {10.7717/peerj-cs.3736}
}
提供机构:
aiatums



