HA-DPO

Name: HA-DPO
Creator: maas
Published: 2025-09-24 14:07:24
License: 暂无描述

魔搭社区2025-09-24 更新2024-08-31 收录

下载链接：

https://modelscope.cn/datasets/OmniData/HA-DPO

下载链接

链接失效反馈

官方服务：

资源简介：

displayName: HA-DPO labelTypes: - Text license: - Apache 2.0 mediaTypes: - Image - Text paperUrl: https://arxiv.org/abs/2311.16839 publishDate: "" publishUrl: "" publisher: - Shanghai Artificial Intelligence Laboratory tags: [] taskTypes: - Reinforcement Learning --- # HA-DPO data (Hallucination-aware Direct Preference Optimization) ![overview](https://github.com/JulioZhao97/HA-DPO-video/assets/40555727/2adeca8a-394e-4b31-9bd7-efd3b9974014) ## Introduction Hallucination-aware positive-negative data for HA-DPO (Hallucination-aware Direct Preference Optimization), which are used for LVLM hallucination mitigation. HA-DPO data consists positive-negative data for 3 LVLMs (MiniGPT-4, InstructBLIP, and LLaVA-1.5) in 2 formats (dense image description and question answering). ## Data Construction 1. **Description Generation:** We randomly select images from the VG dataset and use the LVLM to generate corresponding detailed descriptions. 2. **GPT-4 Hallucination Detection and Correction:** GPT-4 check whether there are hallucinations in the generated description and revise hallucinated description into correct description. 3. **Style-consistent Data Augmentation**: GPT-4 rewrite the positive and negative samples obtained in the previous step, ensuring that the positivity and negativity remain unchanged. Besides, we further augment positive and negative data into question-answering format. ## Data Format ### Description ```json [ { "image_id": 2374756, "chosen": [ "The picture portrays a crowd of individuals congregated on...", "As seen in the image, a collection of people is assembled in...", "n the depicted scene, a bunch of individuals has gathered in a field...", ], "rejected": [ "The picture depicts a crowd of individuals assembled in a green field...", "Seen in the picture is a collection of people congregated in a lush open space,...", "The image presents a gathering of people in a verdant field,...", ] }, ... ] ``` ```image_id```: Visual Genome image id. ```chosen```: 3 chosen correct descriptions about the image. ```rejected```: 3 rejected hallucinated descriptions about the image. ### Question-answering ```json [ { "image_id": 2324811, "question": "Is there a backpack placed on the ground near the motorcycle?", "chosen": "No, there isn't a backpack placed on the ground near the motorcycle. The backpack is attached to the back of the motorcycle, specifically on the seat.", "rejected": "Yes, there is a backpack placed on the ground near the motorcycle.", }, ... ] ``` ```image_id```: Visual Genome image id. ```chosen```: chosen correct answer to the question. ```rejected```: rejected hallucinated answer to the question. # HA-DPO data (幻觉偏好消除数据集) ## 简介 HA-DPO数据（幻觉偏好消除数据集），是包含了幻觉偏好的用于多模态大模型的幻觉消除数据集. HA-DPO数据包含了三种LVLM（MiniGPT-4, InstructBLIP, 以及LLaVA-1.5），两种格式（图像描述以及问答）的幻觉消除偏好数据. ## 数据构造 1. **描述生成:** 随机选取Visual Genome中的2K图像，让LVLM尽可能详细的描述图像内容。 2. **GPT-4幻觉检测以及修正:** GPT-4检查LVLM的描述是否包含幻觉，然后对存在幻觉的描述进行修正，得到不包含幻觉的正样本。 3. **风格一致性增强**: 为了保证偏好学习稳定性，GPT-4对正负样本进行改写增强，除此之外还将图像描述正负样本转换为问答形式正负样本。 ## 数据格式 ### 图像描述 ```json [ { "image_id": 2374756, "chosen": [ "The picture portrays a crowd of individuals congregated on...", "As seen in the image, a collection of people is assembled in...", "n the depicted scene, a bunch of individuals has gathered in a field...", ], "rejected": [ "The picture depicts a crowd of individuals assembled in a green field...", "Seen in the picture is a collection of people congregated in a lush open space,...", "The image presents a gathering of people in a verdant field,...", ] }, ... ] ``` ```image_id```: Visual Genome 图像编号。 ```chosen```: 3个不包含幻觉图像描述正样本。 ```rejected```: 3个包含幻觉的图像描述负样本。 ### 问答 ```json [ { "image_id": 2324811, "question": "Is there a backpack placed on the ground near the motorcycle?", "chosen": "No, there isn't a backpack placed on the ground near the motorcycle. The backpack is attached to the back of the motorcycle, specifically on the seat.", "rejected": "Yes, there is a backpack placed on the ground near the motorcycle.", }, ... ] ``` ```image_id```: Visual Genome图像编号。 ```chosen```: 不包含幻觉的正样本回答。 ```rejected```: 包含幻觉的负样本回答。 ## Reference（引文） ``` @misc{zhao2023hallucinations, title={Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization}, author={Zhiyuan Zhao and Bin Wang and Linke Ouyang and Xiaoyi Dong and Jiaqi Wang and Conghui He}, year={2023}, eprint={2311.16839}, archivePrefix={arXiv}, primaryClass={cs.CV} } @misc{conghui2022opendatalab, author={He, Conghui and Li, Wei and Jin, Zhenjiang and Wang, Bin and Xu, Chao and Lin, Dahua}, title={OpenDataLab: Empowering General Artificial Intelligence with Open Datasets}, howpublished = {\url{https://opendatalab.com}}, year={2022} } ``` ## Download dataset :modelscope-code[]{type="git"}

displayName: HA-DPO labelTypes: 文本 license: Apache 2.0 mediaTypes: 图像、文本 paperUrl: https://arxiv.org/abs/2311.16839 publishDate: 无 publishUrl: 无 publisher: 上海人工智能实验室 tags: 无 taskTypes: 强化学习 --- # HA-DPO数据集（幻觉感知直接偏好优化，Hallucination-aware Direct Preference Optimization，简称HA-DPO） ![overview](https://github.com/JulioZhao97/HA-DPO-video/assets/40555727/2adeca8a-394e-4b31-9bd7-efd3b9974014) ## 简介本数据集为HA-DPO任务提供幻觉感知型正负样本对，用于缓解大视觉语言模型（Large Vision-Language Model，LVLM）的幻觉问题。该数据集涵盖3款大视觉语言模型（MiniGPT-4、InstructBLIP及LLaVA-1.5）的两类格式样本：密集图像描述与问答形式。 ## 数据构建 1. **描述生成**：从Visual Genome（VG）数据集中随机选取图像，使用大视觉语言模型生成对应详细图像描述。 2. **GPT-4幻觉检测与修正**：由GPT-4检测生成的描述是否存在幻觉内容，并将存在幻觉的描述修正为准确合规的正确描述。 3. **风格一致性数据增强**：GPT-4对前一步得到的正负样本进行改写，确保样本的正负属性保持不变；此外，我们进一步将图像描述类正负样本转换为问答格式样本。 ## 数据格式 ### 图像描述 json [ { "image_id": 2374756, "chosen": [ "The picture portrays a crowd of individuals congregated on...", "As seen in the image, a collection of people is assembled in...", "n the depicted scene, a bunch of individuals has gathered in a field...", ], "rejected": [ "The picture depicts a crowd of individuals assembled in a green field...", "Seen in the picture is a collection of people congregated in a lush open space,...", "The image presents a gathering of people in a verdant field,...", ] }, ... ] image_id：Visual Genome图像编号。 chosen：该图像的3条合规（无幻觉）描述正样本。 rejected：该图像的3条含幻觉的违规描述负样本。 ### 问答形式 json [ { "image_id": 2324811, "question": "Is there a backpack placed on the ground near the motorcycle?", "chosen": "No, there isn't a backpack placed on the ground near the motorcycle. The backpack is attached to the back of the motorcycle, specifically on the seat.", "rejected": "Yes, there is a backpack placed on the ground near the motorcycle.", }, ... ] image_id：Visual Genome图像编号。 chosen：该问题的合规（无幻觉）回答正样本。 rejected：该问题的含幻觉的违规回答负样本。 ## 参考文献 @misc{zhao2023hallucinations, title={Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization}, author={Zhiyuan Zhao and Bin Wang and Linke Ouyang and Xiaoyi Dong and Jiaqi Wang and Conghui He}, year={2023}, eprint={2311.16839}, archivePrefix={arXiv}, primaryClass={cs.CV} } @misc{conghui2022opendatalab, author={He, Conghui and Li, Wei and Jin, Zhenjiang and Wang, Bin and Xu, Chao and Lin, Dahua}, title={OpenDataLab: Empowering General Artificial Intelligence with Open Datasets}, howpublished = {url{https://opendatalab.com}}, year={2022} } ## 数据集下载 :modelscope-code[]{type="git"}

提供机构：

maas

创建时间：

2024-07-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集