RLHF-V-Dataset

Name: RLHF-V-Dataset
Creator: maas
Published: 2026-01-02 16:33:45
License: 暂无描述

魔搭社区2026-01-02 更新2025-05-17 收录

下载链接：

https://modelscope.cn/datasets/OpenBMB/RLHF-V-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for RLHF-V-Dataset [Project Page](https://rlhf-v.github.io/) | [Paper](https://arxiv.org/abs/2312.00849) | [GitHub](https://github.com/RLHF-V/RLHF-V) ## Updates * [2024.05.28] 📃 Our RLAIF-V paper is accesible at [arxiv](https://arxiv.org/abs/2405.17220) now! * [2024.05.20] 🎉 We release a new feedback dataset, [RLAIF-V-Dataset](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset), which is **a large-scale diverse-task multimodal feedback dataset constructed using open-source models**. You can download the corresponding [dataset](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset) and models ([7B](https://huggingface.co/openbmb/RLAIF-V-7B), [12B](https://huggingface.co/openbmb/RLAIF-V-12B)) now! * [2024.04.11] 🔥 **Our data is used in [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2)**, an end-side multimodal large language model that exhibits comparable trustworthiness with GPT-4V! * [2024.01.06] 🔥 **A larger, more diverse set of fine-grained human correction data is available now!** 🔥 The newly released data has about **5.7k of fine-grained human correction data** that covers the output of **more powerful models** (Qwen-VL-Chat, InstructBLIP, etc.). We also **expand the image types** from everyday scenes to diverse styles and themes (WikiArt, landmarks, scene texts, etc.). * [2024.01.05] 🔧 We reformat our dataset and now it is **more convenient to preview and use** our data! The dataset now supports the `load_dataset` function, and the data content can be easily previewed online. * [2023.12.15] We incorporated a new annotation subset with an additional **1065 fine-grained annotations** into our dataset ! ## Dataset Summary RLHF-V-Dataset is the human preference data used in "**RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback**". We collected a large amount of **fine-grained segment-level human corrections** on diverse instructions, including detailed descriptions and question-answering instructions. The dataset contains a total of 5,733 preference pairs. <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/jerEZiHDDc2ceF9anVHR-.png" alt="fig1" width="60%"/> Utilizing our dataset can dramatically **reduce model hallucinations by 34.8%** while **keeping informativeness**. <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/7xJEdKXeW33iKdHqJwvNN.png" alt="fig2" width="70%"/> ## Usage ```python from datasets import load_dataset data = load_dataset("HaoyeZhang/RLHF-V-Dataset") ``` ## Data fields | | Key | Description | | ---- | ---------------- | ------------------------------------------------------------ | | 0 | `ds_name` | Dataset name. | | 1 | `image` | Dict contains path and bytes. If loaded by `load_dataset`, it can be automatically converted into a PIL Image. | | 2 | `text` | Preference data. Each data item contains a dict with the keys "question", "chosen", and "rejected". | | 3 | `origin_dataset` | Original dataset for annotation, which is not used in training. | | 4 | `origin_split` | Meta information for each data item, including the name of the model we use to generate the original answer, and the question type ("detailed description" or "question answering") | | 5 | `idx` | Data index. | | 6 | `image_path` | Image path. | ## Citation If you find this dataset helpful, please consider cite our papers 📝: ``` @article{yu2023rlhf, title={Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback}, author={Yu, Tianyu and Yao, Yuan and Zhang, Haoye and He, Taiwen and Han, Yifeng and Cui, Ganqu and Hu, Jinyi and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong and others}, journal={arXiv preprint arXiv:2312.00849}, year={2023} } @article{yu2024rlaifv, title={RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness}, author={Yu, Tianyu and Zhang, Haoye and Yao, Yuan and Dang, Yunkai and Chen, Da and Lu, Xiaoman and Cui, Ganqu and He, Taiwen and Liu, Zhiyuan and Chua, Tat-Seng and Sun, Maosong}, journal={arXiv preprint arXiv:2405.17220}, year={2024}, } ```

# RLHF-V数据集卡片 [项目页面](https://rlhf-v.github.io/) | [论文](https://arxiv.org/abs/2312.00849) | [GitHub仓库](https://github.com/RLHF-V/RLHF-V) ## 更新记录 * [2024.05.28] 📃 我们的RLAIF-V论文现已上线[arxiv](https://arxiv.org/abs/2405.17220)！ * [2024.05.20] 🎉 我们发布了全新的反馈数据集[RLAIF-V数据集](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset)，**该数据集是使用开源模型构建的大规模多任务多模态反馈数据集**。您现在即可下载对应[数据集](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset)与模型([7B](https://huggingface.co/openbmb/RLAIF-V-7B)、[12B](https://huggingface.co/openbmb/RLAIF-V-12B))！ * [2024.04.11] 🔥 **我们的数据已被[MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2)采用**，该模型是一款端侧多模态大语言模型（Multimodal Large Language Model, MLLM），可信度可与GPT-4V媲美！ * [2024.01.06] 🔥 **现已推出规模更大、类型更多样的细粒度人工修正数据！** 🔥 本次新增的数据包含约**5700条细粒度人工修正样本**，覆盖**更强力模型**的输出结果（如Qwen-VL-Chat、InstructBLIP等）。我们同时将**图像类型从日常场景扩展至多样化风格与主题**（如维基百科艺术画作、地标建筑、场景文本等）。 * [2024.01.05] 🔧 我们对数据集进行了重构，如今**数据集更便于预览与使用**！该数据集现已支持`load_dataset`函数，用户可在线轻松预览数据内容。 * [2023.12.15] 我们向数据集中新增了一个全新的标注子集，包含额外**1065条细粒度人工标注**。 ## 数据集概述 RLHF-V数据集是论文"**RLHF-V: 通过细粒度修正人工反馈实现行为对齐以构建可信多模态大语言模型（Multimodal Large Language Model, MLLM）**"中所使用的人类偏好数据集。我们针对多样化指令收集了大量**细粒度片段级人工修正数据**，涵盖详细描述类与问答类指令。本数据集共包含5733组偏好对。 <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/jerEZiHDDc2ceF9anVHR-.png" alt="fig1" width="60%"/> 使用本数据集可将模型幻觉率显著**降低34.8%**，同时**保持信息丰富度**。 <img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/7xJEdKXeW33iKdHqJwvNN.png" alt="fig2" width="70%"/> ## 使用方法 python from datasets import load_dataset data = load_dataset("HaoyeZhang/RLHF-V-Dataset") ## 数据字段 | | 键名 | 说明 | | ---- | ---------------- | ------------------------------------------------------------ | | 0 | `ds_name` | 数据集名称。 | | 1 | `image` | 包含路径与字节信息的字典。若通过`load_dataset`加载，可自动转换为PIL图像（PIL Image）。 | | 2 | `text` | 偏好数据。每个数据项为一个字典，包含「question」「chosen」与「rejected」三个键。 | | 3 | `origin_dataset` | 用于标注的原始数据集，未参与模型训练。 | | 4 | `origin_split` | 单条数据的元信息，包括用于生成原始答案的模型名称，以及问题类型（「详细描述」或「问答」） | | 5 | `idx` | 数据索引。 | | 6 | `image_path` | 图像路径。 | ## 引用说明若您认为本数据集对您的研究有所帮助，请引用以下论文 📝: @article{yu2023rlhf, title={Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback}, author={Yu, Tianyu and Yao, Yuan and Zhang, Haoye and He, Taiwen and Han, Yifeng and Cui, Ganqu and Hu, Jinyi and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong and others}, journal={arXiv preprint arXiv:2312.00849}, year={2023} } @article{yu2024rlaifv, title={RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness}, author={Yu, Tianyu and Zhang, Haoye and Yao, Yuan and Dang, Yunkai and Chen, Da and Lu, Xiaoman and Cui, Ganqu and He, Taiwen and Liu, Zhiyuan and Chua, Tat-Seng and Sun, Maosong}, journal={arXiv preprint arXiv:2405.17220}, year={2024}, }

提供机构：

maas

创建时间：

2025-05-15

搜集汇总

数据集介绍