changdae/vittle-llavabench-coco-textual-perturbed
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/changdae/vittle-llavabench-coco-textual-perturbed
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- visual-question-answering
tags:
- robustness
- LLaVA-Bench
- COCO
- perturbation
- vittle
- text-perturbation
pretty_name: "Vittle - Textually Perturbed LLaVA-Bench-COCO"
size_categories:
- n<1K
---
# Vittle - Textually Perturbed LLaVA-Bench-COCO
This dataset provides **textually perturbed** variants of the [LLaVA-Bench (COCO)](https://arxiv.org/abs/2304.08485) open-ended VQA benchmark.
It is released as part of the [Vittle (Visual Instruction Bottleneck Tuning)](https://arxiv.org/abs/2505.13946) project (NeurIPS 2025).
## Overview
- **Questions**: 90 base questions x 9 textual perturbation variants = 810 perturbed questions. Clean images are used.
- **Images**: 30 unique COCO val2014 images (clean, unperturbed)
## Textual Perturbations
Generated following [MM-Robustness](https://github.com/Jielin-Qiu/MM_Robustness) for char/word-level, and GPT-4o for sentence-level (translation):
### Char/Word-level Perturbations
| Perturbation | File | Description |
|---|---|---|
| Random Delete | `qa90_questions_rd_7.jsonl` | Random character deletion (severity 7) |
| Random Swap | `qa90_questions_rs_4.jsonl` | Random character swap (severity 4) |
| Random Insert | `qa90_questions_ri_4.jsonl` | Random character insertion (severity 4) |
| Keyboard Aug | `qa90_questions_KeyboardAug_3.jsonl` | Keyboard-based typo augmentation (severity 3) |
| Char Delete | `qa90_questions_RandomCharAug_delete_3.jsonl` | Random character deletion augmentation (severity 3) |
| Char Insert | `qa90_questions_RandomCharAug_insert_3.jsonl` | Random character insertion augmentation (severity 3) |
### Sentence-level Perturbations (Translation)
| Perturbation | File | Description |
|---|---|---|
| Hindi | `qa90_questions_Hindi.jsonl` | GPT-4o translation to Hindi |
| Greek | `qa90_questions_Greek.jsonl` | GPT-4o translation to Greek |
| Arabic | `qa90_questions_Arabic.jsonl` | GPT-4o translation to Arabic |
## File Structure
```
.
├── README.md
├── qa90_questions.jsonl # 90 original (clean) questions
├── questions_perturbed/
│ ├── qa90_questions_rd_7.jsonl
│ ├── qa90_questions_rs_4.jsonl
│ ├── qa90_questions_ri_4.jsonl
│ ├── qa90_questions_KeyboardAug_3.jsonl
│ ├── qa90_questions_RandomCharAug_delete_3.jsonl
│ ├── qa90_questions_RandomCharAug_insert_3.jsonl
│ ├── qa90_questions_Hindi.jsonl
│ ├── qa90_questions_Greek.jsonl
│ └── qa90_questions_Arabic.jsonl
└── images/
└── val2014/ # 30 clean COCO images
```
## Citation
```bibtex
@inproceedings{
oh2025visual,
title={Visual Instruction Bottleneck Tuning},
author={Changdae Oh and Jiatong Li and Shawn Im and Sharon Li},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=yzHiEmLSk8}
}
```
## License
MIT
提供机构:
changdae



