Starrrrrry/RLAIF-V-Bias-Dataset

Name: Starrrrrry/RLAIF-V-Bias-Dataset
Creator: Starrrrrry
Published: 2024-12-16 11:56:07
License: 暂无描述

Hugging Face2024-12-16 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/Starrrrrry/RLAIF-V-Bias-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

RLAIF-V-Bias-Dataset是基于RLAIF-V-Dataset构建的，旨在通过LLaVA-v1.5-7b模型减轻多模态学习模型（MLLMs）中的模态偏差问题。RLAIF-V-Dataset提供了83,132对高质量反馈，指令来自多个数据集，如MSCOCO、ShareGPT-4V等。在此基础上，引导LLaVA-v1.5-7b生成过度依赖文本模态的语言偏差答案（“question_only”）和过度依赖视觉模态的视觉偏差答案（“image_only”）。由于模型预训练知识和拒绝响应等挑战，生成了大量噪声样本。因此，提出了噪声感知偏好优化（NaPO）来对抗数据中的噪声。

The RLAIF-V-Bias-Dataset is constructed based on the RLAIF-V-Dataset to mitigate the issue of modality bias in MLLMs using the LLaVA-v1.5-7b model. The dataset provides 83,132 high-quality preference pairs, with instructions collected from various datasets including MSCOCO, ShareGPT-4V, MovieNet, Google Landmark v2, VQA v2, OKVQA, and TextVQA. Additionally, image description prompts introduced in RLHF-V are adopted as long-form image-captioning instructions. To generate language-biased and vision-biased answers, the LLaVA-v1.5-7b model is guided to produce answers that overly rely on the textual modality (“question_only”) and visual modality (“image_only”). During the process of generating biased responses, challenges such as the model’s pretrained knowledge and refusal to respond can lead to the generation of a significant amount of noisy samples. Therefore, a Noise-Aware Preference Optimization (NaPO) is proposed to counteract the noise in the data.

提供机构：

Starrrrrry

5,000+

优质数据集

54 个

任务类型

进入经典数据集