aiatums/INDOMEME

Name: aiatums/INDOMEME
Creator: aiatums
Published: 2026-04-03 10:24:35
License: 暂无描述

Hugging Face2026-04-03 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/aiatums/INDOMEME

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - id license: cc-by-4.0 task_categories: - image-classification - text-classification tags: - hate-speech - meme - multimodal - indonesian - social-media --- # INDOMEME INDOMEME is a multimodal dataset of Indonesian memes collected from Facebook, annotated for hate speech detection and content appropriateness classification. Each meme is enriched with OCR-extracted text and LLM-generated captions to support multimodal analysis. ## Dataset Columns | Column | Description | |--------|-------------| | `image` | Meme image | | `image_path` | Original filename of the meme image | | `hate_final` | Hatefulness label: `hate` or `not hate` | | `appropriate_final` | Appropriateness label: `appropriate` or `inappropriate` | | `topic` | Topical focus in English (e.g., `gender`, `political`, `individual`) | | `topic_id` | Topical focus in Indonesian | | `ocr` | Text extracted from the meme image using Qwen2-VL-2B | | `caption` | Formal image caption generated by Gemini 2.5 Flash describing the visual content and implied message | ### Label Details **`hate_final`**: A meme is labeled `hate` if it contains insults, harassment, negative stereotypes, or demeaning insinuations toward individuals or groups. Otherwise it is labeled `not hate`. **`appropriate_final`**: A meme is labeled `inappropriate` if it contains coarse language, sexual references, depictions of violence, or other content unsuitable for public viewing. Otherwise it is labeled `appropriate`. Note that all hateful memes are also labeled as `inappropriate`. **`topic` / `topic_id`**: Topical focus categories include: Gender, Individual, National Origin/Ethnicity/Race, Political, Religion, Institution/Company, Social Sub-groups, and None/Others. A meme may have multiple topics. ## Citation If you use this dataset, please cite: ```bibtex @article{pamungkas2026indomeme, title = {Decoding hate in memes: multimodal and multitask approaches for low-resource Indonesian social media}, author = {Pamungkas, Endang Wahyu and Wahyuni, Cahyaningtyas Sekar and Amal, Ikhlasul and Purworini, Dian and Rintyarna, Bagus Setya}, journal = {PeerJ Computer Science}, volume = {12}, pages = {e3736}, year = {2026}, doi = {10.7717/peerj-cs.3736} } ```

--- 语言： - 印尼语（id）许可协议：CC BY 4.0 任务类别： - 图像分类 - 文本分类标签： - 仇恨言论（hate-speech） - 表情包（meme） - 多模态（multimodal） - 印尼语 - 社交媒体 --- # INDOMEME INDOMEME是一个源自Facebook的印尼语表情包多模态数据集，经标注用于仇恨言论检测与内容适宜性分类。每个表情包均附带通过光学字符识别（OCR，Optical Character Recognition）提取的文本，以及大语言模型（LLM，Large Language Model）生成的字幕，以支撑多模态分析研究。 ## 数据集字段 | 字段 | 描述 | |--------|-------------| | `image` | 表情包图像 | | `image_path` | 表情包图像的原始文件名 | | `hate_final` | 仇恨性标签：`hate`（仇恨）或`not hate`（非仇恨） | | `appropriate_final` | 适宜性标签：`appropriate`（适宜）或`inappropriate`（不适宜） | | `topic` | 英文主题类别（例如`gender`（性别）、`political`（政治）、`individual`（个体）） | | `topic_id` | 印尼语主题类别 | | `ocr` | 使用Qwen2-VL-2B模型从表情包图像中提取的文本 | | `caption` | 由Gemini 2.5 Flash生成的正式图像字幕，用于描述图像视觉内容与隐含信息 | ### 标签说明 **`hate_final`**：若表情包包含针对个人或群体的侮辱、骚扰、负面刻板印象或贬低性暗示，则标注为`hate`（仇恨）；否则标注为`not hate`（非仇恨）。 **`appropriate_final`**：若表情包包含粗俗语言、性暗示、暴力描绘或其他不适宜公开展示的内容，则标注为`inappropriate`（不适宜）；否则标注为`appropriate`（适宜）。需注意，所有仇恨类表情包均会被同时标注为`inappropriate`（不适宜）。 **`topic` / `topic_id`**：主题类别包括：性别（Gender）、个体（Individual）、国籍/族裔/种族（National Origin/Ethnicity/Race）、政治（Political）、宗教（Religion）、机构/企业（Institution/Company）、社会亚群体（Social Sub-groups）以及无/其他（None/Others）。单个表情包可对应多个主题。 ## 引用说明若使用本数据集，请引用以下文献： bibtex @article{pamungkas2026indomeme, title = {Decoding hate in memes: multimodal and multitask approaches for low-resource Indonesian social media}, author = {Pamungkas, Endang Wahyu and Wahyuni, Cahyaningtyas Sekar and Amal, Ikhlasul and Purworini, Dian and Rintyarna, Bagus Setya}, journal = {PeerJ Computer Science}, volume = {12}, pages = {e3736}, year = {2026}, doi = {10.7717/peerj-cs.3736} }

提供机构：

aiatums

5,000+

优质数据集

54 个

任务类型

进入经典数据集