RLAIF-V-Dataset

Name: RLAIF-V-Dataset
Creator: maas
Published: 2026-05-16 19:59:12
License: 暂无描述

魔搭社区2026-05-16 更新2024-05-25 收录

下载链接：

https://modelscope.cn/datasets/OpenBMB/RLAIF-V-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for RLAIF-V-Dataset This dataset was introduced in [RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness](https://huggingface.co/papers/2405.17220). [GitHub ](https://github.com/RLHF-V/RLAIF-V) This dataset was also used in [MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe](https://huggingface.co/papers/2509.18154) ## News: * **[2025.09.18]** 🎉 Our data is used in the powerful [MiniCPM-V 4.5](https://huggingface.co/openbmb/MiniCPM-V-4_5) model, which represents a state-of-the-art end-side MLLM achieving GPT-4o level performance! * **[2025.03.01]** 🎉 RLAIF-V is accepted by CVPR 2025! You can access the lastest version of the paper at [here](https://arxiv.org/abs/2405.17220). * **[2024.05.28]** 📃 Our paper is accesible at [arxiv](https://arxiv.org/abs/2405.17220) now! * **[2024.05.20]** 🔥 Our data is used in [MiniCPM-Llama3-V 2.5](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5), which represents the first end-side MLLM achieving GPT-4V level performance! ## Dataset Summary RLAIF-V-Dataset is a **large-scale multimodal feedback dataset**. The dataset provides **high-quality feedback** with a total number of **83,132 preference pairs**, where the **instructions are collected from a diverse range of datasets** including MSCOCO, ShareGPT-4V, MovieNet, Google Landmark v2, VQA v2, OKVQA, and TextVQA. In addition, we adopt image description prompts introduced in RLHF-V as long-form image-captioning instructions. By training on these data, our models can reach **superior trustworthiness compared to both open-source and proprietary models**. <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/XWrALoch6pceJsoxaMHKe.png" alt="fig1" width="45%"/> More experimental results are in the following table. By applying RLAIF-V, we present the [RLAIF-V 7B](https://huggingface.co/openbmb/RLAIF-V-7B) (**the most trustworthy variant of LLaVA 1.5**) and [RLAIF-V 12B](https://huggingface.co/openbmb/RLAIF-V-12B) (**the most trustworthy MLLM**), with outstanding trustworthiness and competitive general performance: <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/dhsi5_okbtlBp2pfYOkFK.png" alt="fig1" width="70%"/> Our data also exhibits **good generalizability** to improve the trustworthiness of a diverse set of MLLMs. <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/_au9ixUW3f7vOO0eswpsn.png" alt="fig2" width="45%"/> ## Related Sources - Models Trained on RLAIF-V: - 💎 [MiniCPM-V Series](https://github.com/OpenBMB/MiniCPM-V): MiniCPM-V is a series of end-side MLLMs with GPT-4V comparable performance. - 🏆 [RLAIF-V](https://github.com/RLHF-V/RLAIF-V): RLAIF-V is a series of MLLMs with far more trustworthiness than GPT-4V. ## Usage ```python from datasets import load_dataset data = load_dataset("openbmb/RLAIF-V-Dataset") ``` ## Data fields | | Key | Description | | ---- | ---------------- | ------------------------------------------------------------ | | 0 | `ds_name` | Dataset name. | | 1 | `image` | Dict contains path and bytes. If loaded by `load_dataset`, it can be automatically converted into a PIL Image. | | 2 | `question` | Input query for MLLMs. | | 3 | `chosen` | Chosen response for the question. | | 4 | `rejected` | Rejected response for the question. | | 5 | `origin_dataset` | Original dataset for the image or question. | | 6 | `origin_split` | Meta information for each data item, including the name of the model we use to generate the chosen and rejected answer pair, the labeling model to provide feedback, and the question type ("detailed description" or "question answering") | | 7 | `idx` | Data index. | | 8 | `image_path` | Image path. | ## Citation If you find our model/code/paper helpful, please consider cite our papers 📝: ```bibtex @article{yu2023rlhf, title={Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback}, author={Yu, Tianyu and Yao, Yuan and Zhang, Haoye and He, Taiwen and Han, Yifeng and Cui, Ganqu and Hu, Jinyi and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong and others}, journal={arXiv preprint arXiv:2312.00849}, year={2023} } @article{yu2024rlaifv, title={RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness}, author={Tianyu Yu and Haoye Zhang and Qiming Li and Qixin Xu and Yuan Yao and Da Chen and Xiaoman Lu and Ganqu Cui and Yunkai Dang and Taiwen He and Xiaocheng Feng and Jun Song and Bo Zheng and Zhiyuan Liu and Tat-Seng Chua and Maosong Sun}, journal={arXiv preprint arXiv:2405.17220}, year={2024}, } @misc{yu2025minicpmv45cookingefficient, title={MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe}, author={Tianyu Yu and Zefan Wang and Chongyi Wang and Fuwei Huang and Wenshuo Ma and Zhihui He and Tianchi Cai and Weize Chen and Yuxiang Huang and Yuanqian Zhao and Bokai Xu and Junbo Cui and Yingjing Xu and Liqing Ruan and Luoyuan Zhang and Hanyu Liu and Jingkun Tang and Hongyuan Liu and Qining Guo and Wenhao Hu and Bingxiang He and Jie Zhou and Jie Cai and Ji Qi and Zonghao Guo and Chi Chen and Guoyang Zeng and Yuxuan Li and Ganqu Cui and Ning Ding and Xu Han and Yuan Yao and Zhiyuan Liu and Maosong Sun}, year={2025}, eprint={2509.18154}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2509.18154}, } ```

# RLAIF-V-Dataset 数据集卡片本数据集首次在《RLAIF-V：开源AI反馈助力实现超越GPT-4V的可信度》[https://huggingface.co/papers/2405.17220] 中提出。 [GitHub 仓库](https://github.com/RLHF-V/RLAIF-V) 本数据集同时被应用于《MiniCPM-V 4.5：通过架构、数据与训练配方构建高效多模态大语言模型》[https://huggingface.co/papers/2509.18154] ## 动态更新 * **[2025.09.18]** 🎉 本数据集被应用于高性能模型[MiniCPM-V 4.5](https://huggingface.co/openbmb/MiniCPM-V-4_5)，该模型属于当前顶尖的端侧多模态大语言模型（Multimodal Large Language Model，MLLM），性能可达GPT-4o级别！ * **[2025.03.01]** 🎉 RLAIF-V 被 CVPR 2025 收录！您可通过[此处](https://arxiv.org/abs/2405.17220)获取论文最新版本。 * **[2024.05.28]** 📃 我们的论文现已在[arxiv](https://arxiv.org/abs/2405.17220)上线！ * **[2024.05.20]** 🔥 本数据集被应用于[MiniCPM-Llama3-V 2.5](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5)，该模型是首款性能达到GPT-4V级别的端侧多模态大语言模型！ ## 数据集概述 RLAIF-V-Dataset 是一款**大规模多模态反馈数据集**。该数据集包含总计**83,132条偏好对**，并提供**高质量反馈**。其中，指令来自多个多样化数据集，包括MSCOCO、ShareGPT-4V、MovieNet、Google Landmark v2、VQA v2、OKVQA 以及 TextVQA。此外，我们采用了RLHF-V中提出的图像描述提示作为长文本图像-字幕生成指令。通过在这些数据上进行训练，我们的模型相比开源与闭源模型均能达到**更优异的可信度**。 <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/XWrALoch6pceJsoxaMHKe.png" alt="图1" width="45%"/> 更多实验结果见下表。通过应用RLAIF-V，我们推出了[RLAIF-V 7B](https://huggingface.co/openbmb/RLAIF-V-7B)（**LLaVA 1.5系列中可信度最高的模型**）与[RLAIF-V 12B](https://huggingface.co/openbmb/RLAIF-V-12B)（**可信度最高的多模态大语言模型**），二者兼具出色的可信度与极具竞争力的通用性能： <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/dhsi5_okbtlBp2pfYOkFK.png" alt="实验结果表" width="70%"/> 我们的数据集还展现出**良好的泛化能力**，可用于提升各类多模态大语言模型的可信度。 <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/_au9ixUW3f7vOO0eswpsn.png" alt="图2" width="45%"/> ## 相关资源 - 基于RLAIF-V训练的模型： - 💎 [MiniCPM-V 系列](https://github.com/OpenBMB/MiniCPM-V)：MiniCPM-V 是一系列性能可与GPT-4V媲美的端侧多模态大语言模型。 - 🏆 [RLAIF-V](https://github.com/RLHF-V/RLAIF-V)：RLAIF-V 是一系列可信度远超GPT-4V的多模态大语言模型。 ## 使用方法 python from datasets import load_dataset data = load_dataset("openbmb/RLAIF-V-Dataset") ## 数据字段 | 序号 | 键名 | 描述 | | ---- | ---------------- | ------------------------------------------------------------ | | 0 | `ds_name` | 数据集名称。 | | 1 | `image` | 包含路径与字节信息的字典。若通过`load_dataset`加载，可自动转换为PIL图像。 | | 2 | `question` | 多模态大语言模型的输入查询。 | | 3 | `chosen` | 针对该问题的优选回复。 | | 4 | `rejected` | 针对该问题的落选回复。 | | 5 | `origin_dataset` | 图像或问题的原始数据集。 | | 6 | `origin_split` | 每条数据的元信息，包括用于生成优选与落选回复对的模型名称、提供反馈的标注模型，以及问题类型("详细描述"或"问答") | | 7 | `idx` | 数据索引。 | | 8 | `image_path` | 图像路径。 | ## 引用如果您认为我们的模型、代码或论文对您有所帮助，请引用以下论文 📝： bibtex @article{yu2023rlhf, title={Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback}, author={Yu, Tianyu and Yao, Yuan and Zhang, Haoye and He, Taiwen and Han, Yifeng and Cui, Ganqu and Hu, Jinyi and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong and others}, journal={arXiv preprint arXiv:2312.00849}, year={2023} } @article{yu2024rlaifv, title={RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness}, author={Tianyu Yu and Haoye Zhang and Qiming Li and Qixin Xu and Yuan Yao and Da Chen and Xiaoman Lu and Ganqu Cui and Yunkai Dang and Taiwen He and Xiaocheng Feng and Jun Song and Bo Zheng and Zhiyuan Liu and Tat-Seng Chua and Maosong Sun}, journal={arXiv preprint arXiv:2405.17220}, year={2024}, } @misc{yu2025minicpmv45cookingefficient, title={MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe}, author={Tianyu Yu and Zefan Wang and Chongyi Wang and Fuwei Huang and Wenshuo Ma and Zhihui He and Tianchi Cai and Weize Chen and Yuxiang Huang and Yuanqian Zhao and Bokai Xu and Junbo Cui and Yingjing Xu and Liqing Ruan and Luoyuan Zhang and Hanyu Liu and Jingkun Tang and Hongyuan Liu and Qining Guo and Wenhao Hu and Bingxiang He and Jie Zhou and Jie Cai and Ji Qi and Zonghao Guo and Chi Chen and Guoyang Zeng and Yuxuan Li and Ganqu Cui and Ning Ding and Xu Han and Yuan Yao and Zhiyuan Liu and Maosong Sun}, year={2025}, eprint={2509.18154}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2509.18154}, }

提供机构：

maas

创建时间：

2024-06-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集