Starrrrrry/RLAIF-V-Bias-Dataset
收藏Hugging Face2024-12-16 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Starrrrrry/RLAIF-V-Bias-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
RLAIF-V-Bias-Dataset是基于RLAIF-V-Dataset构建的,旨在通过LLaVA-v1.5-7b模型减轻多模态学习模型(MLLMs)中的模态偏差问题。RLAIF-V-Dataset提供了83,132对高质量反馈,指令来自多个数据集,如MSCOCO、ShareGPT-4V等。在此基础上,引导LLaVA-v1.5-7b生成过度依赖文本模态的语言偏差答案(“question_only”)和过度依赖视觉模态的视觉偏差答案(“image_only”)。由于模型预训练知识和拒绝响应等挑战,生成了大量噪声样本。因此,提出了噪声感知偏好优化(NaPO)来对抗数据中的噪声。
The RLAIF-V-Bias-Dataset is constructed based on the RLAIF-V-Dataset to mitigate the issue of modality bias in MLLMs using the LLaVA-v1.5-7b model. The dataset provides 83,132 high-quality preference pairs, with instructions collected from various datasets including MSCOCO, ShareGPT-4V, MovieNet, Google Landmark v2, VQA v2, OKVQA, and TextVQA. Additionally, image description prompts introduced in RLHF-V are adopted as long-form image-captioning instructions. To generate language-biased and vision-biased answers, the LLaVA-v1.5-7b model is guided to produce answers that overly rely on the textual modality (“question_only”) and visual modality (“image_only”). During the process of generating biased responses, challenges such as the model’s pretrained knowledge and refusal to respond can lead to the generation of a significant amount of noisy samples. Therefore, a Noise-Aware Preference Optimization (NaPO) is proposed to counteract the noise in the data.
提供机构:
Starrrrrry



