five

Anish/nepali-meme-captions

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Anish/nepali-meme-captions
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ne - en tags: - nepali - memes - hate-speech-detection - gemini - caption - low-resource - multimodal --- # NeMeme-CAP: Nepali Meme Captions ## Dataset Summary - English-language captions generated by Google Gemini for the [CHiPSAL 2026 SubtaskA Nepali Meme Datset](https://github.com/therealthapa/chipsal26-memes). - The context-aware captions was generated accross the training, validation, and test splits. ## Supported Tasks - **Hateful Meme Classification:** Predict whether the meme is non-hateful (label=0) and hateful (label=1). - **Multimodal Meme Understanding:** Useful as auxiliary text features or as ground-truth explanations for vision-language models. ## Dataset Structure ### Data Fields | Field | Type | Description | |---------|--------|-------------| | `index` | string | Filename of the corresponding meme image (e.g., `1154.jpg`) to join with the original CHIPSAL 2026 image files. To deal with class-imbalance of minority non-hate class (label=0), one additional caption for each image was generated indicated by suffix _aug (e.g. `1154_aug.jpg`). | `text` | string | English-language caption of the meme generated by Google Gemini. | | `label` | int64 | Label inherited from the CHIPSAL 2026 dataset. `0` = non-hate, `1` = hate. | ### Data Splits: | Split | Rows | |------------|-------| | Train | 1,420 | | Validation | 133 | | Test | 134 | | **Total** | **1,683** | In Training Splits minority non-hate class was augmented by doubling the generated captions, leveraging the stochastic (`temperature=1`) nature of Gemini models. ## How to Use ```python from datasets import load_dataset ds = load_dataset("Anish/nepali-meme-captions") # access training split print(ds["train"][0]) # {'index': '1154.jpg', 'text': '...', 'label': 0} ``` To use this dataset alongside meme images, download the images from the [CHIPSAL 2026 GitHub repository](https://github.com/therealthapa/chipsal26-memes) for Subtask A and join on the `index` field. ## Limitations - Meme Images are not included. Visit [CHIPSAL 2026 repository](https://github.com/therealthapa/chipsal26-memes) for the original meme images. - Caption accuracy depends on Gemini's visual understanding. ## Contact For questions or feedback, please open a discussion or contact the dataset author.
提供机构:
Anish
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作