yilingwang/visual-contrastive-dpo-dataset
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/yilingwang/visual-contrastive-dpo-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
task_categories:
- image-text-to-text
- visual-question-answering
language:
- en
tags:
- multimodal
- dpo
- hallucination
- preference-learning
- vlm
size_categories:
- 1K<n<10K
---
# Visual Contrastive DPO Dataset
Hallucination-aligned visual negative construction dataset for multimodal DPO training.
## Dataset Summary
This dataset contains **8,456 preference pairs** with **7,971 generated negative images** for cross-modal DPO training on VLMs. Each negative image is generated to visually depict the hallucinated content from the model's rejected response.
## Data Construction Pipeline
1. **Self-rollout sampling**: Qwen2.5-VL-3B generates 8 candidate responses per (image, question) pair
2. **Judge scoring**: Qwen3-8B scores each response (0-10) against ground truth
3. **Pair selection**: Filter pairs with chosen score ≥ 6, rejected score ≤ 5, margin ≥ 4
4. **Semantic extraction**: Extract key differences between chosen/rejected responses
5. **Image generation**: Qwen-Image generates negative images matching rejected (hallucinated) content
## Files
- `rlhfv_ovip_dpo.jsonl` - Full dataset (8,456 pairs)
- `rlhfv_ovip_dpo_filtered.jsonl` - Filtered dataset (3,493 pairs, removed length-biased and reference-fallback pairs)
- `images/` - Original images from RLAIF-V and RLHF-V
- `generated_images/` - Generated negative images (384×384)
## Data Format
```json
{
"image": "images/rlhf/1640.jpg",
"question": "Is there only one person visible?",
"chosen": "Yes, only one person is visible.",
"rejected": "No, there is no person visible.",
"id": "rlhf_1640",
"edited_image_path": "generated_images/rlhf_1640_neg.png",
"edit_original": "one person visible",
"edit_modified": "no person visible",
"edit_difference": "presence vs absence of person"
}
```
## Source Data
- RLAIF-V (openbmb/RLAIF-V-Dataset): ~10K sampled
- RLHF-V (openbmb/RLHF-V-Dataset): ~5.7K full
## Models Used
| Component | Model |
|-----------|-------|
| Sampling | Qwen2.5-VL-3B-Instruct |
| Judge | Qwen3-8B |
| Image Generation | Tongyi-MAI/Z-Image-Turbo |
提供机构:
yilingwang



