PahaII/vllm_safety_evaluation
收藏Hugging Face2023-11-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/PahaII/vllm_safety_evaluation
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
# How Many Unicorns Are In This Image? A Safety Evaluation Benchmark For Vision LLMs (Dataset)
Paper: https://arxiv.org/abs/2311.16101
Code: https://github.com/UCSC-VLAA/vllm-safety-benchmark
The full dataset should looks like this:
```
.
├── ./safety_evaluation_benchmark_datasets//
├── gpt4v_challenging_set # Contains the challenging test data for GPT4V
├── attack_images
├── sketchy_images
├── oodcv_images
├── misleading-attack.json
├── sketchy-vqa-challenging.json
└── oodcv-vqa-counterfactual.json
├── redteaming-mislead # Contains the test data for redteaming tasks
├── redteaming_attack
├── gaussian_noise
├── mixattack_eps32
├── mixattack_eps64
├── sinattack_eps64_dog
├── sinattack_eps64_coconut
├── sinattack_eps64_spaceship
└── annotation.json
└── jailbreak_llm # adversarial suffixes for jailbreaking VLLM through LLM
└── ood # Contains the test data for OOD scenarios
├── sketchy-vqa
├── sketchy-vqa.json
├── sketchy-challenging.json
└── oodcv-vqa
├── oodcv-vqa.json
└── oodcv-counterfactual.json
```
提供机构:
PahaII
原始信息汇总
How Many Unicorns Are In This Image? A Safety Evaluation Benchmark For Vision LLMs (数据集)
数据集结构
数据集包含以下几个主要部分:
1. gpt4v_challenging_set
- 描述:包含针对GPT4V的挑战性测试数据。
- 内容:
attack_images:攻击图像sketchy_images:草图图像oodcv_images:OODCV图像misleading-attack.json:误导性攻击数据sketchy-vqa-challenging.json:草图VQA挑战数据oodcv-vqa-counterfactual.json:OODCV VQA反事实数据
2. redteaming-mislead
- 描述:包含红队任务的测试数据。
- 内容:
redteaming_attack:红队攻击数据gaussian_noise:高斯噪声mixattack_eps32:混合攻击eps32mixattack_eps64:混合攻击eps64sinattack_eps64_dog:正弦攻击eps64狗sinattack_eps64_coconut:正弦攻击eps64椰子sinattack_eps64_spaceship:正弦攻击eps64宇宙飞船annotation.json:标注数据
jailbreak_llm:通过LLM破解VLLM的对抗后缀
3. ood
- 描述:包含OOD场景的测试数据。
- 内容:
sketchy-vqa:草图VQA数据sketchy-vqa.json:草图VQA数据sketchy-challenging.json:草图挑战数据
oodcv-vqa:OODCV VQA数据oodcv-vqa.json:OODCV VQA数据oodcv-counterfactual.json:OODCV反事实数据



