five

Starrrrrry/RLAIF-V-Bias-Dataset

收藏
Hugging Face2024-12-16 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Starrrrrry/RLAIF-V-Bias-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
RLAIF-V-Bias-Dataset是基于RLAIF-V-Dataset构建的,旨在通过LLaVA-v1.5-7b模型减轻多模态学习模型(MLLMs)中的模态偏差问题。RLAIF-V-Dataset提供了83,132对高质量反馈,指令来自多个数据集,如MSCOCO、ShareGPT-4V等。在此基础上,引导LLaVA-v1.5-7b生成过度依赖文本模态的语言偏差答案(“question_only”)和过度依赖视觉模态的视觉偏差答案(“image_only”)。由于模型预训练知识和拒绝响应等挑战,生成了大量噪声样本。因此,提出了噪声感知偏好优化(NaPO)来对抗数据中的噪声。

The RLAIF-V-Bias-Dataset is constructed based on the RLAIF-V-Dataset to mitigate the issue of modality bias in MLLMs using the LLaVA-v1.5-7b model. The dataset provides 83,132 high-quality preference pairs, with instructions collected from various datasets including MSCOCO, ShareGPT-4V, MovieNet, Google Landmark v2, VQA v2, OKVQA, and TextVQA. Additionally, image description prompts introduced in RLHF-V are adopted as long-form image-captioning instructions. To generate language-biased and vision-biased answers, the LLaVA-v1.5-7b model is guided to produce answers that overly rely on the textual modality (“question_only”) and visual modality (“image_only”). During the process of generating biased responses, challenges such as the model’s pretrained knowledge and refusal to respond can lead to the generation of a significant amount of noisy samples. Therefore, a Noise-Aware Preference Optimization (NaPO) is proposed to counteract the noise in the data.
提供机构:
Starrrrrry
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作