yilingwang/visual-contrastive-dpo-dataset

Name: yilingwang/visual-contrastive-dpo-dataset
Creator: yilingwang
Published: 2026-03-23 14:38:22
License: 暂无描述

Hugging Face2026-03-23 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/yilingwang/visual-contrastive-dpo-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 task_categories: - image-text-to-text - visual-question-answering language: - en tags: - multimodal - dpo - hallucination - preference-learning - vlm size_categories: - 1K<n<10K --- # Visual Contrastive DPO Dataset Hallucination-aligned visual negative construction dataset for multimodal DPO training. ## Dataset Summary This dataset contains **8,456 preference pairs** with **7,971 generated negative images** for cross-modal DPO training on VLMs. Each negative image is generated to visually depict the hallucinated content from the model's rejected response. ## Data Construction Pipeline 1. **Self-rollout sampling**: Qwen2.5-VL-3B generates 8 candidate responses per (image, question) pair 2. **Judge scoring**: Qwen3-8B scores each response (0-10) against ground truth 3. **Pair selection**: Filter pairs with chosen score ≥ 6, rejected score ≤ 5, margin ≥ 4 4. **Semantic extraction**: Extract key differences between chosen/rejected responses 5. **Image generation**: Qwen-Image generates negative images matching rejected (hallucinated) content ## Files - `rlhfv_ovip_dpo.jsonl` - Full dataset (8,456 pairs) - `rlhfv_ovip_dpo_filtered.jsonl` - Filtered dataset (3,493 pairs, removed length-biased and reference-fallback pairs) - `images/` - Original images from RLAIF-V and RLHF-V - `generated_images/` - Generated negative images (384×384) ## Data Format ```json { "image": "images/rlhf/1640.jpg", "question": "Is there only one person visible?", "chosen": "Yes, only one person is visible.", "rejected": "No, there is no person visible.", "id": "rlhf_1640", "edited_image_path": "generated_images/rlhf_1640_neg.png", "edit_original": "one person visible", "edit_modified": "no person visible", "edit_difference": "presence vs absence of person" } ``` ## Source Data - RLAIF-V (openbmb/RLAIF-V-Dataset): ~10K sampled - RLHF-V (openbmb/RLHF-V-Dataset): ~5.7K full ## Models Used | Component | Model | |-----------|-------| | Sampling | Qwen2.5-VL-3B-Instruct | | Judge | Qwen3-8B | | Image Generation | Tongyi-MAI/Z-Image-Turbo |

提供机构：

yilingwang

5,000+

优质数据集

54 个

任务类型

进入经典数据集