five

QCRI/MemeLens-VLM

收藏
Hugging Face2026-03-29 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/QCRI/MemeLens-VLM
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 task_categories: - image-classification - text-classification - visual-question-answering language: - ar - bn - de - en - es - hi - ro - ru - zh tags: - memes - multimodal - multilingual - hate-speech - explanation - llm-judge size_categories: - 100K<n<1M dataset_info: - config_name: default splits: - name: train - name: test - name: val --- # MemeLens-VLM A large-scale multilingual multimodal meme understanding benchmark with 46 classification tasks across 9 languages, enriched with LLM-generated explanations and LLM-as-Judge quality scores. This is the VLM (Vision-Language Model) version of [MemeLens](https://huggingface.co/datasets/QCRI/MemeLens), extended with natural language explanations for each sample and automated quality evaluation via LLM-as-Judge. **Paper:** [MemeLens: A Multimodal, Multilingual Benchmark for Meme Understanding](https://arxiv.org/abs/2601.12539) ## Dataset Overview | Statistic | Value | |-----------|-------| | Total samples | 271,835 | | Datasets/Tasks | 46 | | Languages | 9 (ar, bn, de, en, es, hi, ro, ru, zh) | | Splits | train / test / val | | Test samples with judge scores | 44,370 / 46,401 (95.6%) | ## Structure The dataset is organized by language: ``` {language}/ {dataset_name}/ images/ train.jsonl test.jsonl val.jsonl ``` ## Fields **All splits:** | Field | Description | |-------|-------------| | `id` | Unique sample identifier | | `image` | Relative path to the meme image | | `text` | OCR/extracted text from the meme | | `label` | Classification label for the task | | `task_description` | English description of the classification task | | `explanation` | LLM-generated English explanation justifying the label | | `native_label` | (multilingual only) Label in the meme's native language | | `native_task_description` | (multilingual only) Task description in native language | | `native_explanation` | (multilingual only) Explanation in native language | **Test split only (LLM-as-Judge):** | Field | Description | |-------|-------------| | `informativeness` | Average judge score (1–5) from GPT-5 and Gemini-2.5-Pro | | `clarity` | Average judge score (1–5) from GPT-5 and Gemini-2.5-Pro | | `plausibility` | Average judge score (1–5) from GPT-5 and Gemini-2.5-Pro | | `faithfulness` | Average judge score (1–5) from GPT-5 and Gemini-2.5-Pro | | `llm_judge` | Per-criterion scores and justifications from each judge model | ## Languages and Tasks | Language | # Tasks | Datasets | |----------|---------|----------| | Arabic (ar) | 2 | Hateful_ar__Prop2Hate-Meme, propoganda_ar_ArMeme | | Bengali (bn) | 5 | abuse, sarcasm, sentiment, vulgar (BanglaAbuseMeme), Hateful (MUTE) | | German (de) | 1 | Hateful_de__Multi3Hate | | English (en) | 23 | HarMeme, FHM, MMHS, MAMI, memotion, MET_Meme, Multi3Hate, MIMIC | | Spanish (es) | 1 | Hateful_es__Multi3Hate | | Hindi (hi) | 3 | Hateful (Multi3Hate), Misogyny, Misogyny_Categories (MIMIC2024) | | Romanian (ro) | 4 | deepfake, emotion, political, sentiment (RoMemes) | | Russian (ru) | 1 | toxic_ru__Toxic_Memes_Detection_Dataset | | Chinese (zh) | 6 | Hateful (Multi3Hate), intention, metaphor, offensiveness, sentiment (MET_Meme) | ## Citation ```bibtex @article{memelens2025, title={MemeLens: A Multimodal, Multilingual Benchmark for Meme Understanding}, author={Shahraur, Ali and Bayan, Mohamed and others}, journal={arXiv preprint arXiv:2601.12539}, year={2025} } ``` ## Related - **Dataset (classification only):** [QCRI/MemeLens](https://huggingface.co/datasets/QCRI/MemeLens) - **Paper:** [arXiv:2601.12539](https://arxiv.org/abs/2601.12539)
提供机构:
QCRI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作