changdae/vittle-pope-visual-perturbed
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/changdae/vittle-pope-visual-perturbed
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- visual-question-answering
tags:
- robustness
- hallucination
- POPE
- COCO
- perturbation
- vittle
pretty_name: "Vittle - Visually Perturbed POPE Benchmark"
size_categories:
- 1K<n<10K
---
# Vittle - Visually Perturbed POPE Benchmark
This dataset provides **visually perturbed** variants of the [POPE (Polling-based Object Probing Evaluation)](https://arxiv.org/abs/2305.10355) benchmark, built on COCO val2014 images.
It is released as part of the [Vittle (Visual Instruction Bottleneck Tuning)](https://arxiv.org/abs/2505.13946) project (NeurIPS 2025).
## Overview
- **Questions**: 9,000 yes/no object hallucination probing questions (3,000 each for adversarial / popular / random splits)
- **Images**: 500 unique COCO val2014 images, each with 9 visual perturbation variants (severity level 3)
- **Total image files**: 4,500 (500 images x 9 perturbations)
## Visual Perturbations
All perturbations are at severity level 3, generated following [MM-Robustness](https://github.com/Jielin-Qiu/MM_Robustness):
| Perturbation | Folder |
|---|---|
| Gaussian Noise | `images/COCO_IP_gaussian_noise_3/` |
| Shot Noise | `images/COCO_IP_shot_noise_3/` |
| Speckle Noise | `images/COCO_IP_speckle_noise_3/` |
| Fog | `images/COCO_IP_fog_3/` |
| Contrast | `images/COCO_IP_contrast_3/` |
| Brightness | `images/COCO_IP_brightness_3/` |
| Defocus Blur | `images/COCO_IP_defocus_blur_3/` |
| Zoom Blur | `images/COCO_IP_zoom_blur_3/` |
| Frost | `images/COCO_IP_frost_3/` |
## File Structure
```
.
├── README.md
├── llava_pope_test.jsonl # 9,000 questions
├── annotations/
│ ├── coco_pope_adversarial.json # 3,000 adversarial split labels
│ ├── coco_pope_popular.json # 3,000 popular split labels
│ └── coco_pope_random.json # 3,000 random split labels
└── images/
├── COCO_IP_gaussian_noise_3/ # 500 images
├── COCO_IP_shot_noise_3/
├── COCO_IP_speckle_noise_3/
├── COCO_IP_fog_3/
├── COCO_IP_contrast_3/
├── COCO_IP_brightness_3/
├── COCO_IP_defocus_blur_3/
├── COCO_IP_zoom_blur_3/
└── COCO_IP_frost_3/
```
## Question Format (JSONL)
```json
{"question_id": 0, "image": "COCO_val2014_000000007991.jpg", "text": "Is there a snowboard in the image?\nAnswer the question using a single word or phrase.", "category": "adversarial"}
```
## Citation
```bibtex
@inproceedings{
oh2025visual,
title={Visual Instruction Bottleneck Tuning},
author={Changdae Oh and Jiatong Li and Shawn Im and Sharon Li},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=yzHiEmLSk8}
}
```
## License
MIT
提供机构:
changdae



