five

HA-DPO

收藏
魔搭社区2025-09-24 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/OmniData/HA-DPO
下载链接
链接失效反馈
官方服务:
资源简介:
displayName: HA-DPO labelTypes: - Text license: - Apache 2.0 mediaTypes: - Image - Text paperUrl: https://arxiv.org/abs/2311.16839 publishDate: "" publishUrl: "" publisher: - Shanghai Artificial Intelligence Laboratory tags: [] taskTypes: - Reinforcement Learning --- # HA-DPO data (Hallucination-aware Direct Preference Optimization) ![overview](https://github.com/JulioZhao97/HA-DPO-video/assets/40555727/2adeca8a-394e-4b31-9bd7-efd3b9974014) ## Introduction Hallucination-aware positive-negative data for HA-DPO (Hallucination-aware Direct Preference Optimization), which are used for LVLM hallucination mitigation. HA-DPO data consists positive-negative data for 3 LVLMs (MiniGPT-4, InstructBLIP, and LLaVA-1.5) in 2 formats (dense image description and question answering). ## Data Construction 1. **Description Generation:** We randomly select images from the VG dataset and use the LVLM to generate corresponding detailed descriptions. 2. **GPT-4 Hallucination Detection and Correction:** GPT-4 check whether there are hallucinations in the generated description and revise hallucinated description into correct description. 3. **Style-consistent Data Augmentation**: GPT-4 rewrite the positive and negative samples obtained in the previous step, ensuring that the positivity and negativity remain unchanged. Besides, we further augment positive and negative data into question-answering format. ## Data Format ### Description ```json [ { "image_id": 2374756, "chosen": [ "The picture portrays a crowd of individuals congregated on...", "As seen in the image, a collection of people is assembled in...", "n the depicted scene, a bunch of individuals has gathered in a field...", ], "rejected": [ "The picture depicts a crowd of individuals assembled in a green field...", "Seen in the picture is a collection of people congregated in a lush open space,...", "The image presents a gathering of people in a verdant field,...", ] }, ... ] ``` ```image_id```: Visual Genome image id. ```chosen```: 3 chosen correct descriptions about the image. ```rejected```: 3 rejected hallucinated descriptions about the image. ### Question-answering ```json [ { "image_id": 2324811, "question": "Is there a backpack placed on the ground near the motorcycle?", "chosen": "No, there isn't a backpack placed on the ground near the motorcycle. The backpack is attached to the back of the motorcycle, specifically on the seat.", "rejected": "Yes, there is a backpack placed on the ground near the motorcycle.", }, ... ] ``` ```image_id```: Visual Genome image id. ```chosen```: chosen correct answer to the question. ```rejected```: rejected hallucinated answer to the question. # HA-DPO data (幻觉偏好消除数据集) ## 简介 HA-DPO数据(幻觉偏好消除数据集),是包含了幻觉偏好的用于多模态大模型的幻觉消除数据集. HA-DPO数据包含了三种LVLM(MiniGPT-4, InstructBLIP, 以及LLaVA-1.5),两种格式(图像描述以及问答)的幻觉消除偏好数据. ## 数据构造 1. **描述生成:** 随机选取Visual Genome中的2K图像,让LVLM尽可能详细的描述图像内容。 2. **GPT-4幻觉检测以及修正:** GPT-4检查LVLM的描述是否包含幻觉,然后对存在幻觉的描述进行修正,得到不包含幻觉的正样本。 3. **风格一致性增强**: 为了保证偏好学习稳定性,GPT-4对正负样本进行改写增强,除此之外还将图像描述正负样本转换为问答形式正负样本。 ## 数据格式 ### 图像描述 ```json [ { "image_id": 2374756, "chosen": [ "The picture portrays a crowd of individuals congregated on...", "As seen in the image, a collection of people is assembled in...", "n the depicted scene, a bunch of individuals has gathered in a field...", ], "rejected": [ "The picture depicts a crowd of individuals assembled in a green field...", "Seen in the picture is a collection of people congregated in a lush open space,...", "The image presents a gathering of people in a verdant field,...", ] }, ... ] ``` ```image_id```: Visual Genome 图像编号。 ```chosen```: 3个不包含幻觉图像描述正样本。 ```rejected```: 3个包含幻觉的图像描述负样本。 ### 问答 ```json [ { "image_id": 2324811, "question": "Is there a backpack placed on the ground near the motorcycle?", "chosen": "No, there isn't a backpack placed on the ground near the motorcycle. The backpack is attached to the back of the motorcycle, specifically on the seat.", "rejected": "Yes, there is a backpack placed on the ground near the motorcycle.", }, ... ] ``` ```image_id```: Visual Genome图像编号。 ```chosen```: 不包含幻觉的正样本回答。 ```rejected```: 包含幻觉的负样本回答。 ## Reference(引文) ``` @misc{zhao2023hallucinations, title={Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization}, author={Zhiyuan Zhao and Bin Wang and Linke Ouyang and Xiaoyi Dong and Jiaqi Wang and Conghui He}, year={2023}, eprint={2311.16839}, archivePrefix={arXiv}, primaryClass={cs.CV} } @misc{conghui2022opendatalab, author={He, Conghui and Li, Wei and Jin, Zhenjiang and Wang, Bin and Xu, Chao and Lin, Dahua}, title={OpenDataLab: Empowering General Artificial Intelligence with Open Datasets}, howpublished = {\url{https://opendatalab.com}}, year={2022} } ``` ## Download dataset :modelscope-code[]{type="git"}

displayName: HA-DPO labelTypes: 文本 license: Apache 2.0 mediaTypes: 图像、文本 paperUrl: https://arxiv.org/abs/2311.16839 publishDate: 无 publishUrl: 无 publisher: 上海人工智能实验室 tags: 无 taskTypes: 强化学习 --- # HA-DPO数据集(幻觉感知直接偏好优化,Hallucination-aware Direct Preference Optimization,简称HA-DPO) ![overview](https://github.com/JulioZhao97/HA-DPO-video/assets/40555727/2adeca8a-394e-4b31-9bd7-efd3b9974014) ## 简介 本数据集为HA-DPO任务提供幻觉感知型正负样本对,用于缓解大视觉语言模型(Large Vision-Language Model,LVLM)的幻觉问题。该数据集涵盖3款大视觉语言模型(MiniGPT-4、InstructBLIP及LLaVA-1.5)的两类格式样本:密集图像描述与问答形式。 ## 数据构建 1. **描述生成**:从Visual Genome(VG)数据集中随机选取图像,使用大视觉语言模型生成对应详细图像描述。 2. **GPT-4幻觉检测与修正**:由GPT-4检测生成的描述是否存在幻觉内容,并将存在幻觉的描述修正为准确合规的正确描述。 3. **风格一致性数据增强**:GPT-4对前一步得到的正负样本进行改写,确保样本的正负属性保持不变;此外,我们进一步将图像描述类正负样本转换为问答格式样本。 ## 数据格式 ### 图像描述 json [ { "image_id": 2374756, "chosen": [ "The picture portrays a crowd of individuals congregated on...", "As seen in the image, a collection of people is assembled in...", "n the depicted scene, a bunch of individuals has gathered in a field...", ], "rejected": [ "The picture depicts a crowd of individuals assembled in a green field...", "Seen in the picture is a collection of people congregated in a lush open space,...", "The image presents a gathering of people in a verdant field,...", ] }, ... ] image_id:Visual Genome图像编号。 chosen:该图像的3条合规(无幻觉)描述正样本。 rejected:该图像的3条含幻觉的违规描述负样本。 ### 问答形式 json [ { "image_id": 2324811, "question": "Is there a backpack placed on the ground near the motorcycle?", "chosen": "No, there isn't a backpack placed on the ground near the motorcycle. The backpack is attached to the back of the motorcycle, specifically on the seat.", "rejected": "Yes, there is a backpack placed on the ground near the motorcycle.", }, ... ] image_id:Visual Genome图像编号。 chosen:该问题的合规(无幻觉)回答正样本。 rejected:该问题的含幻觉的违规回答负样本。 ## 参考文献 @misc{zhao2023hallucinations, title={Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization}, author={Zhiyuan Zhao and Bin Wang and Linke Ouyang and Xiaoyi Dong and Jiaqi Wang and Conghui He}, year={2023}, eprint={2311.16839}, archivePrefix={arXiv}, primaryClass={cs.CV} } @misc{conghui2022opendatalab, author={He, Conghui and Li, Wei and Jin, Zhenjiang and Wang, Bin and Xu, Chao and Lin, Dahua}, title={OpenDataLab: Empowering General Artificial Intelligence with Open Datasets}, howpublished = {url{https://opendatalab.com}}, year={2022} } ## 数据集下载 :modelscope-code[]{type="git"}
提供机构:
maas
创建时间:
2024-07-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作