MMPR-v1.2-prompts
收藏魔搭社区2026-01-09 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/OpenGVLab/MMPR-v1.2-prompts
下载链接
链接失效反馈官方服务:
资源简介:
# MMPR-v1.2-Prompts
[\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl2.0_mpo) [\[🆕 Blog\]](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) [\[📜 Paper\]](https://arxiv.org/abs/2411.10442) [\[📖 Documents\]](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html)
***These are prompts used to construct [MMPR-v1.2](https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2), which greatly improves the overall performance of [InternVL3](https://huggingface.co/papers/2504.10479) across all scales.***
To unzip the archive of images, please first run `cat images.zip_* > images.zip` and then run `unzip images.zip`.

## Introduction
MMPR is a large-scale and high-quality multimodal reasoning preference dataset. This dataset includes about 3 million samples.


We finetune InternVL2-8B with [MPO](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/#Mix-Preference-Optimization) using this dataset.
The resulting model, [InternVL2-8B-MPO](https://huggingface.co/OpenGVLab/InternVL2-8B-MPO), achieves superior performance across 8 benchmarks, particularly excelling in multimodal reasoning tasks.
**On the MathVista benchmark, our model achieves an accuracy of 67.0%**, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the \\(10\times\\) larger InternVL2-76B.
**On the MathVision benchmark, our model achieves an accuracy of 25.7%**, establishing a new state-of-the-art performance among open-source models.
These results demonstrate the effectiveness of our preference optimization approach in enhancing multimodal reasoning capabilities.
Additionally, on the POPE benchmark, our model exhibits a 1.2-point improvement over InterVL2-8B, demonstrating the effectiveness of the perception data contained in our MMPR dataset to mitigate hallucinations.
Furthermore, our model also shows superior performance compared to the InternVL2-8B on complex VQA benchmarks, indicating that the general abilities of our model are also improved, benefiting from enhanced reasoning abilities and mitigated hallucinations.
Please refer to our [paper](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) for more details.
| Model Name | M3CoT | MathVista | MathVision MINI | MMVet (GPT4-Turbo) | LLaVA-Bench | POPE | CRPE | MMHalBench |
| ----------------------- | :---: | :-------: | :-------------: | :----------------: | :---------: | :---: | :---: | :--------: |
| Gemini-1.5-Pro | - | 63.9 | 19.2 | - | - | - | - | - |
| GPT-4o | 64.3 | 63.8 | 30.4 | 69.1 | 97.6 | 86.9 | 76.6 | 4.0 |
| GPT-4o-Mini | 61.9 | 52.4 | 27.3 | 66.9 | 95.4 | 85.1 | 73.1 | 3.6 |
| LLaVA-1.5-13B | 39.5 | 27.6 | 11.1 | 36.3 | 70.7 | 85.9 | 55.6 | 2.4 |
| Qwen2-VL-7B | 57.8 | 58.2 | 21.1 | 60.6 | 67.7 | 88.1 | 74.4 | 3.4 |
| MiniCPM-V-2-6-8B | 56.0 | 60.6 | 23.4 | 57.4 | 83.4 | 87.3 | 75.2 | 3.6 |
| LLaVA-OneVision-7B | 52.3 | 63.2 | 18.4 | 51.4 | 79.9 | 88.4 | 73.7 | 3.1 |
| InternVL2-26B | 58.2 | 59.4 | 23.4 | 62.1 | 92.3 | 88.0 | 75.6 | 3.7 |
| InternVL2-40B | 63.6 | 63.7 | 21.4 | 65.5 | 100.5 | 88.4 | 77.3 | 3.9 |
| InternVL2-76B | 65.4 | 67.5 | 23.7 | 65.7 | 99.3 | 89.0 | 77.8 | 3.8 |
| InternVL2-Pro | 65.6 | 66.3 | 18.8 | 69.4 | 99.5 | 88.2 | 77.6 | 3.7 |
| InternVL2-8B | 59.3 | 58.3 | 20.4 | 54.2 | 73.2 | 86.9 | 75.0 | 3.3 |
| InternVL2-8B-MPO (ours) | 79.2 | 67.0 | 25.7 | 56.2 | 76.7 | 88.1 | 75.4 | 3.5 |
Additionally, we finetune InternVL2.5 series with MPO using this dataset. The resulting models outperform their counterparts without MPO by an average of 2 points across all scales on the OpenCompass leaderboard.
| Model | Avg. | MMBench v1.1 | MMStar | MMMU | MathVista | HallusionBench | AI2D | OCRBench | MMVet |
| ------------------- | ---- | ------------ | ------ | ---- | --------- | -------------- | ---- | -------- | ----- |
| InternVL2-5-1B | 54.9 | 66.5 | 51.3 | 41.2 | 47.1 | 39.4 | 69.0 | 77.4 | 47.2 |
| InternVL2-5-1B-MPO | 56.4 | 67.2 | 49.7 | 40.8 | 53.0 | 40.0 | 69.4 | 83.6 | 47.2 |
| InternVL2-5-2B | 59.9 | 70.9 | 54.3 | 43.2 | 51.1 | 42.3 | 74.9 | 80.2 | 62.6 |
| InternVL2-5-2B-MPO | 62.0 | 71.6 | 55.0 | 45.0 | 56.4 | 43.0 | 75.3 | 84.2 | 65.4 |
| InternVL2-5-4B | 65.1 | 78.2 | 58.7 | 51.8 | 60.8 | 46.6 | 81.4 | 82.0 | 61.5 |
| InternVL2-5-4B-MPO | 67.6 | 78.6 | 60.2 | 51.6 | 65.3 | 47.8 | 82.0 | 88.0 | 67.1 |
| InternVL2-5-8B | 68.9 | 82.5 | 63.2 | 56.2 | 64.5 | 49.0 | 84.6 | 82.1 | 62.8 |
| InternVL2-5-8B-MPO | 70.4 | 82.4 | 65.7 | 54.9 | 68.9 | 51.4 | 84.5 | 88.3 | 66.9 |
| InternVL2-5-26B | 71.6 | 84.6 | 66.5 | 60.7 | 68.0 | 55.8 | 86.2 | 85.4 | 65.4 |
| InternVL2-5-26B-MPO | 72.7 | 84.2 | 67.2 | 57.7 | 72.8 | 55.3 | 86.2 | 91.2 | 67.1 |
| InternVL2-5-38B | 73.5 | 85.4 | 68.5 | 64.6 | 72.4 | 57.9 | 87.6 | 84.1 | 67.2 |
| InternVL2-5-38B-MPO | 75.5 | 85.6 | 69.8 | 64.1 | 73.8 | 61.5 | 88.1 | 88.5 | 72.5 |
| InternVL2-5-78B | 75.2 | 87.5 | 69.5 | 70.0 | 70.6 | 57.4 | 89.1 | 85.3 | 71.8 |
| InternVL2-5-78B-MPO | 76.6 | 87.3 | 73.1 | 68.3 | 73.8 | 58.7 | 89.3 | 91.2 | 71.4 |
## Usage
Please refer to [our document](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html).
## Data fields
| Key | Description |
| ---------- | ----------------------------------- |
| `image` | Image path. |
| `question` | Input query. |
| `chosen` | Chosen response for the question. |
| `rejected` | Rejected response for the question. |
## Citation
If you find this project useful in your research, please consider citing:
```BibTeX
@article{wang2024mpo,
title={Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization},
author={Wang, Weiyun and Chen, Zhe and Wang, Wenhai and Cao, Yue and Liu, Yangzhou and Gao, Zhangwei and Zhu, Jinguo and Zhu, Xizhou and Lu, Lewei and Qiao, Yu and Dai, Jifeng},
journal={arXiv preprint arXiv:2411.10442},
year={2024}
}
```
# MMPR-v1.2-Prompts
[📂 GitHub](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl2.0_mpo) [🆕 博客](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) [📜 论文](https://arxiv.org/abs/2411.10442) [📖 文档](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html)
***本数据集包含用于构建[MMPR-v1.2](https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2)的提示词,可全面提升[InternVL3](https://huggingface.co/papers/2504.10479)各尺寸版本的整体性能。***
若需解压图像归档文件,请先执行`cat images.zip_* > images.zip`,再执行`unzip images.zip`。

## 简介
MMPR是一个大规模高质量多模态推理偏好数据集,共包含约300万条样本。


我们使用该数据集结合[混合偏好优化(Mix-Preference Optimization,MPO)](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/#Mix-Preference-Optimization)对InternVL2-8B进行微调,得到的模型[InternVL2-8B-MPO](https://huggingface.co/OpenGVLab/InternVL2-8B-MPO)在8个基准测试中均取得了优异性能,尤其在多模态推理任务中表现突出。
**在MathVista基准测试中,我们的模型准确率达到67.0%**,较InternVL2-8B提升8.7个百分点,性能可与尺寸大10倍的InternVL2-76B相媲美。
**在MathVision基准测试中,我们的模型准确率达到25.7%**,在开源模型中创下了全新的最优性能纪录。
上述结果证明了我们的偏好优化方法在增强多模态推理能力上的有效性。
此外,在POPE基准测试中,我们的模型较InternVL2-8B提升了1.2个百分点,这表明MMPR数据集所包含的感知数据可有效缓解模型幻觉问题。
同时,在复杂视觉问答基准测试中,我们的模型相较InternVL2-8B同样展现出更优性能,这说明得益于推理能力的提升与幻觉问题的缓解,模型的通用能力也得到了增强。
如需了解更多细节,请参考我们的[论文](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/)。
| 模型名称 | M3CoT | MathVista | MathVision MINI | MMVet (GPT4-Turbo) | LLaVA-Bench | POPE | CRPE | MMHalBench |
| ----------------------- | :---: | :-------: | :-------------: | :----------------: | :---------: | :---: | :---: | :--------: |
| Gemini-1.5-Pro | - | 63.9 | 19.2 | - | - | - | - | - |
| GPT-4o | 64.3 | 63.8 | 30.4 | 69.1 | 97.6 | 86.9 | 76.6 | 4.0 |
| GPT-4o-Mini | 61.9 | 52.4 | 27.3 | 66.9 | 95.4 | 85.1 | 73.1 | 3.6 |
| LLaVA-1.5-13B | 39.5 | 27.6 | 11.1 | 36.3 | 70.7 | 85.9 | 55.6 | 2.4 |
| Qwen2-VL-7B | 57.8 | 58.2 | 21.1 | 60.6 | 67.7 | 88.1 | 74.4 | 3.4 |
| MiniCPM-V-2-6-8B | 56.0 | 60.6 | 23.4 | 57.4 | 83.4 | 87.3 | 75.2 | 3.6 |
| LLaVA-OneVision-7B | 52.3 | 63.2 | 18.4 | 51.4 | 79.9 | 88.4 | 73.7 | 3.1 |
| InternVL2-26B | 58.2 | 59.4 | 23.4 | 62.1 | 92.3 | 88.0 | 75.6 | 3.7 |
| InternVL2-40B | 63.6 | 63.7 | 21.4 | 65.5 | 100.5 | 88.4 | 77.3 | 3.9 |
| InternVL2-76B | 65.4 | 67.5 | 23.7 | 65.7 | 99.3 | 89.0 | 77.8 | 3.8 |
| InternVL2-Pro | 65.6 | 66.3 | 18.8 | 69.4 | 99.5 | 88.2 | 77.6 | 3.7 |
| InternVL2-8B | 59.3 | 58.3 | 20.4 | 54.2 | 73.2 | 86.9 | 75.0 | 3.3 |
| InternVL2-8B-MPO (ours) | 79.2 | 67.0 | 25.7 | 56.2 | 76.7 | 88.1 | 75.4 | 3.5 |
此外,我们使用该数据集结合MPO对InternVL2.5系列模型进行了微调,在OpenCompass排行榜上,经MPO优化后的模型相较未优化的同尺寸模型平均性能提升了2个百分点。
| 模型 | 平均得分 | MMBench v1.1 | MMStar | MMMU | MathVista | HallusionBench | AI2D | OCRBench | MMVet |
| ------------------- | ---- | ------------ | ------ | ---- | --------- | -------------- | ---- | -------- | ----- |
| InternVL2-5-1B | 54.9 | 66.5 | 51.3 | 41.2 | 47.1 | 39.4 | 69.0 | 77.4 | 47.2 |
| InternVL2-5-1B-MPO | 56.4 | 67.2 | 49.7 | 40.8 | 53.0 | 40.0 | 69.4 | 83.6 | 47.2 |
| InternVL2-5-2B | 59.9 | 70.9 | 54.3 | 43.2 | 51.1 | 42.3 | 74.9 | 80.2 | 62.6 |
| InternVL2-5-2B-MPO | 62.0 | 71.6 | 55.0 | 45.0 | 56.4 | 43.0 | 75.3 | 84.2 | 65.4 |
| InternVL2-5-4B | 65.1 | 78.2 | 58.7 | 51.8 | 60.8 | 46.6 | 81.4 | 82.0 | 61.5 |
| InternVL2-5-4B-MPO | 67.6 | 78.6 | 60.2 | 51.6 | 65.3 | 47.8 | 82.0 | 88.0 | 67.1 |
| InternVL2-5-8B | 68.9 | 82.5 | 63.2 | 56.2 | 64.5 | 49.0 | 84.6 | 82.1 | 62.8 |
| InternVL2-5-8B-MPO | 70.4 | 82.4 | 65.7 | 54.9 | 68.9 | 51.4 | 84.5 | 88.3 | 66.9 |
| InternVL2-5-26B | 71.6 | 84.6 | 66.5 | 60.7 | 68.0 | 55.8 | 86.2 | 85.4 | 65.4 |
| InternVL2-5-26B-MPO | 72.7 | 84.2 | 67.2 | 57.7 | 72.8 | 55.3 | 86.2 | 91.2 | 67.1 |
| InternVL2-5-38B | 73.5 | 85.4 | 68.5 | 64.6 | 72.4 | 57.9 | 87.6 | 84.1 | 67.2 |
| InternVL2-5-38B-MPO | 75.5 | 85.6 | 69.8 | 64.1 | 73.8 | 61.5 | 88.1 | 88.5 | 72.5 |
| InternVL2-5-78B | 75.2 | 87.5 | 69.5 | 70.0 | 70.6 | 57.4 | 89.1 | 85.3 | 71.8 |
| InternVL2-5-78B-MPO | 76.6 | 87.3 | 73.1 | 68.3 | 73.8 | 58.7 | 89.3 | 91.2 | 71.4 |
## 使用方法
请参考[我们的文档](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html)。
## 数据字段
| 键名 | 描述 |
| ---------- | ----------------------------------- |
| `image` | 图像路径。 |
| `question` | 输入查询语句。 |
| `chosen` | 针对该问题的优选回复。 |
| `rejected` | 针对该问题的非优选回复。 |
## 引用
如果您的研究中用到了本项目,请引用如下文献:
BibTeX
@article{wang2024mpo,
title={通过混合偏好优化提升多模态大语言模型的推理能力},
author={Wang, Weiyun and Chen, Zhe and Wang, Wenhai and Cao, Yue and Liu, Yangzhou and Gao, Zhangwei and Zhu, Jinguo and Zhu, Xizhou and Lu, Lewei and Qiao, Yu and Dai, Jifeng},
journal={arXiv preprint arXiv:2411.10442},
year={2024}
}
提供机构:
maas
创建时间:
2025-04-22
搜集汇总
数据集介绍

背景与挑战
背景概述
MMPR-v1.2-prompts是一个包含约300万样本的多模态推理偏好数据集,用于优化InternVL系列模型,显著提升其在多模态推理任务中的性能。数据集包含图像路径、问题、选择的回答和拒绝的回答等关键字段。
以上内容由遇见数据集搜集并总结生成



