Anish/nepali-meme-captions
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Anish/nepali-meme-captions
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ne
- en
tags:
- nepali
- memes
- hate-speech-detection
- gemini
- caption
- low-resource
- multimodal
---
# NeMeme-CAP: Nepali Meme Captions
## Dataset Summary
- English-language captions generated by Google Gemini for the [CHiPSAL 2026 SubtaskA Nepali Meme Datset](https://github.com/therealthapa/chipsal26-memes).
- The context-aware captions was generated accross the training, validation, and test splits.
## Supported Tasks
- **Hateful Meme Classification:** Predict whether the meme is non-hateful (label=0) and hateful (label=1).
- **Multimodal Meme Understanding:** Useful as auxiliary text features or as ground-truth explanations for vision-language models.
## Dataset Structure
### Data Fields
| Field | Type | Description |
|---------|--------|-------------|
| `index` | string | Filename of the corresponding meme image (e.g., `1154.jpg`) to join with the original CHIPSAL 2026 image files. To deal with class-imbalance of minority non-hate class (label=0), one additional caption for each image was generated indicated by suffix _aug (e.g. `1154_aug.jpg`).
| `text` | string | English-language caption of the meme generated by Google Gemini. |
| `label` | int64 | Label inherited from the CHIPSAL 2026 dataset. `0` = non-hate, `1` = hate. |
### Data Splits:
| Split | Rows |
|------------|-------|
| Train | 1,420 |
| Validation | 133 |
| Test | 134 |
| **Total** | **1,683** |
In Training Splits minority non-hate class was augmented by doubling the generated captions, leveraging the stochastic (`temperature=1`) nature of Gemini models.
## How to Use
```python
from datasets import load_dataset
ds = load_dataset("Anish/nepali-meme-captions")
# access training split
print(ds["train"][0])
# {'index': '1154.jpg', 'text': '...', 'label': 0}
```
To use this dataset alongside meme images, download the images from the [CHIPSAL 2026 GitHub repository](https://github.com/therealthapa/chipsal26-memes) for Subtask A and join on the `index` field.
## Limitations
- Meme Images are not included. Visit [CHIPSAL 2026 repository](https://github.com/therealthapa/chipsal26-memes) for the original meme images.
- Caption accuracy depends on Gemini's visual understanding.
## Contact
For questions or feedback, please open a discussion or contact the dataset author.
提供机构:
Anish



