HA-DPO
收藏魔搭社区2025-09-24 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/OmniData/HA-DPO
下载链接
链接失效反馈官方服务:
资源简介:
displayName: HA-DPO
labelTypes:
- Text
license:
- Apache 2.0
mediaTypes:
- Image
- Text
paperUrl: https://arxiv.org/abs/2311.16839
publishDate: ""
publishUrl: ""
publisher:
- Shanghai Artificial Intelligence Laboratory
tags: []
taskTypes:
- Reinforcement Learning
---
# HA-DPO data (Hallucination-aware Direct Preference Optimization)

## Introduction
Hallucination-aware positive-negative data for HA-DPO (Hallucination-aware Direct Preference Optimization), which are used for LVLM hallucination mitigation. HA-DPO data consists positive-negative data for 3 LVLMs (MiniGPT-4, InstructBLIP, and LLaVA-1.5) in 2 formats (dense image description and question answering).
## Data Construction
1. **Description Generation:** We randomly select images from the VG dataset and use the LVLM to generate corresponding detailed descriptions.
2. **GPT-4 Hallucination Detection and Correction:** GPT-4 check whether there are hallucinations in the generated description and revise hallucinated description into correct description.
3. **Style-consistent Data Augmentation**: GPT-4 rewrite the positive and negative samples obtained in the previous step, ensuring that the positivity and negativity remain unchanged. Besides, we further augment positive and negative data into question-answering format.
## Data Format
### Description
```json
[
{
"image_id": 2374756,
"chosen": [
"The picture portrays a crowd of individuals congregated on...",
"As seen in the image, a collection of people is assembled in...",
"n the depicted scene, a bunch of individuals has gathered in a field...",
],
"rejected": [
"The picture depicts a crowd of individuals assembled in a green field...",
"Seen in the picture is a collection of people congregated in a lush open space,...",
"The image presents a gathering of people in a verdant field,...",
]
},
...
]
```
```image_id```: Visual Genome image id.
```chosen```: 3 chosen correct descriptions about the image.
```rejected```: 3 rejected hallucinated descriptions about the image.
### Question-answering
```json
[
{
"image_id": 2324811,
"question": "Is there a backpack placed on the ground near the motorcycle?",
"chosen": "No, there isn't a backpack placed on the ground near the motorcycle. The backpack is attached to the back of the motorcycle, specifically on the seat.",
"rejected": "Yes, there is a backpack placed on the ground near the motorcycle.",
},
...
]
```
```image_id```: Visual Genome image id.
```chosen```: chosen correct answer to the question.
```rejected```: rejected hallucinated answer to the question.
# HA-DPO data (幻觉偏好消除数据集)
## 简介
HA-DPO数据(幻觉偏好消除数据集),是包含了幻觉偏好的用于多模态大模型的幻觉消除数据集. HA-DPO数据包含了三种LVLM(MiniGPT-4, InstructBLIP, 以及LLaVA-1.5),两种格式(图像描述以及问答)的幻觉消除偏好数据.
## 数据构造
1. **描述生成:** 随机选取Visual Genome中的2K图像,让LVLM尽可能详细的描述图像内容。
2. **GPT-4幻觉检测以及修正:** GPT-4检查LVLM的描述是否包含幻觉,然后对存在幻觉的描述进行修正,得到不包含幻觉的正样本。
3. **风格一致性增强**: 为了保证偏好学习稳定性,GPT-4对正负样本进行改写增强,除此之外还将图像描述正负样本转换为问答形式正负样本。
## 数据格式
### 图像描述
```json
[
{
"image_id": 2374756,
"chosen": [
"The picture portrays a crowd of individuals congregated on...",
"As seen in the image, a collection of people is assembled in...",
"n the depicted scene, a bunch of individuals has gathered in a field...",
],
"rejected": [
"The picture depicts a crowd of individuals assembled in a green field...",
"Seen in the picture is a collection of people congregated in a lush open space,...",
"The image presents a gathering of people in a verdant field,...",
]
},
...
]
```
```image_id```: Visual Genome 图像编号。
```chosen```: 3个不包含幻觉图像描述正样本。
```rejected```: 3个包含幻觉的图像描述负样本。
### 问答
```json
[
{
"image_id": 2324811,
"question": "Is there a backpack placed on the ground near the motorcycle?",
"chosen": "No, there isn't a backpack placed on the ground near the motorcycle. The backpack is attached to the back of the motorcycle, specifically on the seat.",
"rejected": "Yes, there is a backpack placed on the ground near the motorcycle.",
},
...
]
```
```image_id```: Visual Genome图像编号。
```chosen```: 不包含幻觉的正样本回答。
```rejected```: 包含幻觉的负样本回答。
## Reference(引文)
```
@misc{zhao2023hallucinations,
title={Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization},
author={Zhiyuan Zhao and Bin Wang and Linke Ouyang and Xiaoyi Dong and Jiaqi Wang and Conghui He},
year={2023},
eprint={2311.16839},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{conghui2022opendatalab,
author={He, Conghui and Li, Wei and Jin, Zhenjiang and Wang, Bin and Xu, Chao and Lin, Dahua},
title={OpenDataLab: Empowering General Artificial Intelligence with Open Datasets},
howpublished = {\url{https://opendatalab.com}},
year={2022}
}
```
## Download dataset
:modelscope-code[]{type="git"}
displayName: HA-DPO
labelTypes: 文本
license: Apache 2.0
mediaTypes: 图像、文本
paperUrl: https://arxiv.org/abs/2311.16839
publishDate: 无
publishUrl: 无
publisher: 上海人工智能实验室
tags: 无
taskTypes: 强化学习
---
# HA-DPO数据集(幻觉感知直接偏好优化,Hallucination-aware Direct Preference Optimization,简称HA-DPO)

## 简介
本数据集为HA-DPO任务提供幻觉感知型正负样本对,用于缓解大视觉语言模型(Large Vision-Language Model,LVLM)的幻觉问题。该数据集涵盖3款大视觉语言模型(MiniGPT-4、InstructBLIP及LLaVA-1.5)的两类格式样本:密集图像描述与问答形式。
## 数据构建
1. **描述生成**:从Visual Genome(VG)数据集中随机选取图像,使用大视觉语言模型生成对应详细图像描述。
2. **GPT-4幻觉检测与修正**:由GPT-4检测生成的描述是否存在幻觉内容,并将存在幻觉的描述修正为准确合规的正确描述。
3. **风格一致性数据增强**:GPT-4对前一步得到的正负样本进行改写,确保样本的正负属性保持不变;此外,我们进一步将图像描述类正负样本转换为问答格式样本。
## 数据格式
### 图像描述
json
[
{
"image_id": 2374756,
"chosen": [
"The picture portrays a crowd of individuals congregated on...",
"As seen in the image, a collection of people is assembled in...",
"n the depicted scene, a bunch of individuals has gathered in a field...",
],
"rejected": [
"The picture depicts a crowd of individuals assembled in a green field...",
"Seen in the picture is a collection of people congregated in a lush open space,...",
"The image presents a gathering of people in a verdant field,...",
]
},
...
]
image_id:Visual Genome图像编号。
chosen:该图像的3条合规(无幻觉)描述正样本。
rejected:该图像的3条含幻觉的违规描述负样本。
### 问答形式
json
[
{
"image_id": 2324811,
"question": "Is there a backpack placed on the ground near the motorcycle?",
"chosen": "No, there isn't a backpack placed on the ground near the motorcycle. The backpack is attached to the back of the motorcycle, specifically on the seat.",
"rejected": "Yes, there is a backpack placed on the ground near the motorcycle.",
},
...
]
image_id:Visual Genome图像编号。
chosen:该问题的合规(无幻觉)回答正样本。
rejected:该问题的含幻觉的违规回答负样本。
## 参考文献
@misc{zhao2023hallucinations,
title={Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization},
author={Zhiyuan Zhao and Bin Wang and Linke Ouyang and Xiaoyi Dong and Jiaqi Wang and Conghui He},
year={2023},
eprint={2311.16839},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{conghui2022opendatalab,
author={He, Conghui and Li, Wei and Jin, Zhenjiang and Wang, Bin and Xu, Chao and Lin, Dahua},
title={OpenDataLab: Empowering General Artificial Intelligence with Open Datasets},
howpublished = {url{https://opendatalab.com}},
year={2022}
}
## 数据集下载
:modelscope-code[]{type="git"}
提供机构:
maas
创建时间:
2024-07-02



