MLLM-as-a-Judge-GRPO

Name: MLLM-as-a-Judge-GRPO
Creator: maas
Published: 2025-12-06 00:07:13
License: 暂无描述

魔搭社区2025-12-06 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/huangjjaxi/MLLM-as-a-Judge-GRPO

下载链接

链接失效反馈

官方服务：

资源简介：

# MLLM-as-a-Judge 数据集（GRPO训练格式版） ## 简介本数据集是专门为 **GRPO（Group Relative Policy Optimization）训练** 准备的格式化版本。它基于 MLLM-as-a-Judge 原始数据构建，并按照特定规则进行了划分与重构，以支持模型在成对比较场景下的偏好学习。 ## 数据集划分规则数据集的划分严格遵循 **数据集来源** 原则，具体分配如下： ### 训练集来源训练集包含以下所有数据集的全部图像-指令对： | 数据集名称 | 简介/领域 | | :--- | :--- | | MS-COCO | 通用图像描述 | | Conceptual Captions | 网络图像描述 | | ChartQA | 图表问答 | | InfographicsQA | 信息图问答 | | TextVQA | 文本视觉问答 | | WIT | 维基百科图像文本 | | DiffusionDB | AI生成图像 | | CC-3M | 概念标注 | | VisIT-Bench | 视觉指令遵循 | | Mind2Web | 网页交互 | | AesBench | 审美评估 | ### 测试集来源测试集包含以下所有数据集的全部图像-指令对，用于评估特定能力： | 数据集名称 | 评估能力 | | :--- | :--- | | MathVista | 数学推理 | | ScienceQA | 科学知识推理 | | MM-Vet | 综合能力评估 | ## 任务规模与比例本数据集包含三种任务类型，其训练集与测试集比例如下： | 任务类型 | 训练集样本数 | 测试集样本数 | 训练集占比 | | :--- | :---: | :---: | :---: | | Score | 4,273 | 1,446 | **74.7%** | | Pair | 5,660 | 2,096 | **73.0%** | | Batch | 1,102 | 367 | **75.0%** | ### 总体规模统计 - **训练集总数**: 11,035 (4,273 + 5,660 + 1,102) - **测试集总数**: 3,909 (1,446 + 2,096 + 367) - **总样本数**: 14,944 - **训练集比例**: **73.8%** (11,035 / 14,944) - **测试集比例**: **26.2%** (3,909 / 14,944) > **注**：最终的训练集经过了**去重、重构和去除平局情况**的处理，最终得到的**总样本数为 15,033 条**。 ## 训练集数据格式说明每个样本均以 JSONL 格式存储，具体字段如下： ```json { "id": 0, "pair_id": 0, "image_path": "0.jpg", "original_dataset": "coco", "instruction": "Please analyse this figure in detail and answer the following question based on this figure: Why are the men bending down?", "answer1": { "name": "llava", "answer": "In the image, the men are bending down to pick up a baseball that has been hit and is now on the ground. This is a typical scene in a baseball game, where the batter has just hit the ball and the fielders are trying to catch or pick up the ball to get the batter or other runners out. The presence of a baseball bat, baseball glove, and sports ball in the image further supports the context of a baseball game." }, "answer2": { "name": "cogvlm", "answer": "The men are bending down because they are involved in a baseball game. The batter is preparing to hit the ball thrown by the pitcher, and the catcher is positioned behind the batter to catch the ball if the batter misses or decides not to swing. The umpire is closely observing the pitch to make a call on whether it's a strike or a ball." }, "human": "B" } ``` ### `human` 字段释义该字段表示人工对两个回答质量的相对评判，共有三种情况： | 标签 | 含义 | | :---: | :--- | | **A** | `answer1` 的质量优于 `answer2` | | **B** | `answer2` 的质量优于 `answer1` | | **C** | `answer1` 与 `answer2` 质量相当 | > **注意**：在最终的训练集中，`C`（平局）情况已被移除，以确保清晰的偏好信号。

# 多模态大语言模型作为评判器（MLLM-as-a-Judge）数据集（GRPO训练格式版） ## 简介本数据集专为**分组相对策略优化（Group Relative Policy Optimization，简称GRPO）训练**打造格式化版本。其基于多模态大语言模型作为评判器（MLLM-as-a-Judge）的原始数据集构建，并按照既定规则完成划分与重构，以支撑模型在成对比较场景下的偏好学习任务。 ## 数据集划分规则本数据集严格遵循**数据集来源独立**原则进行划分，具体分配如下： ### 训练集来源训练集包含以下所有数据集的全部图像-指令对： | 数据集名称 | 简介/领域 | | :--- | :--- | | MS-COCO | 通用图像描述 | | Conceptual Captions | 网络图像标题生成 | | ChartQA | 图表问答 | | InfographicsQA | 信息图问答 | | TextVQA | 文本视觉问答（TextVQA） | | WIT | 维基百科图像文本（WIT） | | DiffusionDB | AI生成图像数据集（DiffusionDB） | | CC-3M | 概念标注数据集（CC-3M） | | VisIT-Bench | 视觉指令遵循基准（VisIT-Bench） | | Mind2Web | 网页交互数据集（Mind2Web） | | AesBench | 审美评估基准（AesBench） | ### 测试集来源测试集包含以下所有数据集的全部图像-指令对，用于特定能力评估： | 数据集名称 | 评估能力 | | :--- | :--- | | MathVista | 数学视觉推理（MathVista） | | ScienceQA | 科学知识问答（ScienceQA） | | MM-Vet | 多模态综合能力评估（MM-Vet） | ## 任务规模与比例本数据集涵盖三类任务类型，其训练集与测试集占比详情如下： | 任务类型 | 训练集样本数 | 测试集样本数 | 训练集占比 | | :--- | :---: | :---: | :---: | | 评分任务（Score） | 4,273 | 1,446 | **74.7%** | | 成对比较任务（Pair） | 5,660 | 2,096 | **73.0%** | | 批量任务（Batch） | 1,102 | 367 | **75.0%** | ### 总体规模统计 - **训练集总样本数**：11,035（4,273 + 5,660 + 1,102） - **测试集总样本数**：3,909（1,446 + 2,096 + 367） - **总样本量**：14,944 - **训练集占比**：**73.8%**（11,035 / 14,944） - **测试集占比**：**26.2%**（3,909 / 14,944） > **注**：最终训练集经过**去重、格式重构与平局样本剔除**处理，最终总样本量达到**15,033条**。 ## 训练集数据格式说明所有样本均以JSONL格式存储，具体字段示例如下： json { "id": 0, "pair_id": 0, "image_path": "0.jpg", "original_dataset": "coco", "instruction": "Please analyse this figure in detail and answer the following question based on this figure: Why are the men bending down?", "answer1": { "name": "llava", "answer": "In the image, the men are bending down to pick up a baseball that has been hit and is now on the ground. This is a typical scene in a baseball game, where the batter has just hit the ball and the fielders are trying to catch or pick up the ball to get the batter or other runners out. The presence of a baseball bat, baseball glove, and sports ball in the image further supports the context of a baseball game." }, "answer2": { "name": "cogvlm", "answer": "The men are bending down because they are involved in a baseball game. The batter is preparing to hit the ball thrown by the pitcher, and the catcher is positioned behind the batter to catch the ball if the batter misses or decides not to swing. The umpire is closely observing the pitch to make a call on whether it's a strike or a ball." }, "human": "B" } ### `human` 字段释义该字段代表人工对两个回答质量的相对评判结果，共包含三种情况： | 标签 | 含义 | | :---: | :--- | | **A** | `answer1` 的质量优于 `answer2` | | **B** | `answer2` 的质量优于 `answer1` | | **C** | `answer1` 与 `answer2` 质量相当 | > **注意**：最终训练集中已剔除标签为`C`的平局样本，以确保偏好学习信号的清晰性。

提供机构：

maas

创建时间：

2025-12-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集