five

aiatums/INDOMEME

收藏
Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/aiatums/INDOMEME
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - id license: cc-by-4.0 task_categories: - image-classification - text-classification tags: - hate-speech - meme - multimodal - indonesian - social-media --- # INDOMEME INDOMEME is a multimodal dataset of Indonesian memes collected from Facebook, annotated for hate speech detection and content appropriateness classification. Each meme is enriched with OCR-extracted text and LLM-generated captions to support multimodal analysis. ## Dataset Columns | Column | Description | |--------|-------------| | `image` | Meme image | | `image_path` | Original filename of the meme image | | `hate_final` | Hatefulness label: `hate` or `not hate` | | `appropriate_final` | Appropriateness label: `appropriate` or `inappropriate` | | `topic` | Topical focus in English (e.g., `gender`, `political`, `individual`) | | `topic_id` | Topical focus in Indonesian | | `ocr` | Text extracted from the meme image using Qwen2-VL-2B | | `caption` | Formal image caption generated by Gemini 2.5 Flash describing the visual content and implied message | ### Label Details **`hate_final`**: A meme is labeled `hate` if it contains insults, harassment, negative stereotypes, or demeaning insinuations toward individuals or groups. Otherwise it is labeled `not hate`. **`appropriate_final`**: A meme is labeled `inappropriate` if it contains coarse language, sexual references, depictions of violence, or other content unsuitable for public viewing. Otherwise it is labeled `appropriate`. Note that all hateful memes are also labeled as `inappropriate`. **`topic` / `topic_id`**: Topical focus categories include: Gender, Individual, National Origin/Ethnicity/Race, Political, Religion, Institution/Company, Social Sub-groups, and None/Others. A meme may have multiple topics. ## Citation If you use this dataset, please cite: ```bibtex @article{pamungkas2026indomeme, title = {Decoding hate in memes: multimodal and multitask approaches for low-resource Indonesian social media}, author = {Pamungkas, Endang Wahyu and Wahyuni, Cahyaningtyas Sekar and Amal, Ikhlasul and Purworini, Dian and Rintyarna, Bagus Setya}, journal = {PeerJ Computer Science}, volume = {12}, pages = {e3736}, year = {2026}, doi = {10.7717/peerj-cs.3736} } ```

--- 语言: - 印尼语(id) 许可协议:CC BY 4.0 任务类别: - 图像分类 - 文本分类 标签: - 仇恨言论(hate-speech) - 表情包(meme) - 多模态(multimodal) - 印尼语 - 社交媒体 --- # INDOMEME INDOMEME是一个源自Facebook的印尼语表情包多模态数据集,经标注用于仇恨言论检测与内容适宜性分类。每个表情包均附带通过光学字符识别(OCR,Optical Character Recognition)提取的文本,以及大语言模型(LLM,Large Language Model)生成的字幕,以支撑多模态分析研究。 ## 数据集字段 | 字段 | 描述 | |--------|-------------| | `image` | 表情包图像 | | `image_path` | 表情包图像的原始文件名 | | `hate_final` | 仇恨性标签:`hate`(仇恨)或`not hate`(非仇恨) | | `appropriate_final` | 适宜性标签:`appropriate`(适宜)或`inappropriate`(不适宜) | | `topic` | 英文主题类别(例如`gender`(性别)、`political`(政治)、`individual`(个体)) | | `topic_id` | 印尼语主题类别 | | `ocr` | 使用Qwen2-VL-2B模型从表情包图像中提取的文本 | | `caption` | 由Gemini 2.5 Flash生成的正式图像字幕,用于描述图像视觉内容与隐含信息 | ### 标签说明 **`hate_final`**:若表情包包含针对个人或群体的侮辱、骚扰、负面刻板印象或贬低性暗示,则标注为`hate`(仇恨);否则标注为`not hate`(非仇恨)。 **`appropriate_final`**:若表情包包含粗俗语言、性暗示、暴力描绘或其他不适宜公开展示的内容,则标注为`inappropriate`(不适宜);否则标注为`appropriate`(适宜)。需注意,所有仇恨类表情包均会被同时标注为`inappropriate`(不适宜)。 **`topic` / `topic_id`**:主题类别包括:性别(Gender)、个体(Individual)、国籍/族裔/种族(National Origin/Ethnicity/Race)、政治(Political)、宗教(Religion)、机构/企业(Institution/Company)、社会亚群体(Social Sub-groups)以及无/其他(None/Others)。单个表情包可对应多个主题。 ## 引用说明 若使用本数据集,请引用以下文献: bibtex @article{pamungkas2026indomeme, title = {Decoding hate in memes: multimodal and multitask approaches for low-resource Indonesian social media}, author = {Pamungkas, Endang Wahyu and Wahyuni, Cahyaningtyas Sekar and Amal, Ikhlasul and Purworini, Dian and Rintyarna, Bagus Setya}, journal = {PeerJ Computer Science}, volume = {12}, pages = {e3736}, year = {2026}, doi = {10.7717/peerj-cs.3736} }
提供机构:
aiatums
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作