five

MMPR-v1.1

收藏
魔搭社区2026-04-28 更新2024-12-28 收录
下载链接:
https://modelscope.cn/datasets/OpenGVLab/MMPR-v1.1
下载链接
链接失效反馈
官方服务:
资源简介:
# MMPR-v1.1 [\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl2.0_mpo) [\[🆕 Blog\]](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) [\[📜 Paper\]](https://arxiv.org/abs/2411.10442) [\[📖 Documents\]](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html) ***`2025/04/11:` We release a new version of MMPR (i.e., [MMPR-v1.2](https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2)), which greatly enhances the overall performance of InternVL3.*** <!-- ***This is a newer version of [MMPR](https://huggingface.co/datasets/OpenGVLab/MMPR), which includes additional data sources to enhance the data diversity and improves the performance of InternVL2.5 by an average of 2 points across all scales on the OpenCompass leaderboard.*** --> To unzip the archive of images, please first run `cat images.zip_* > images.zip` and then run `unzip images.zip`. ## Introduction MMPR is a large-scale and high-quality multimodal reasoning preference dataset. This dataset includes about 3 million samples. ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/619507e7b74b6c591f794340/mmXL47UPDFwYOWdn9Z6j5.jpeg) ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/619507e7b74b6c591f794340/6fnvI_wCd9JXAs6vYthaG.jpeg) We finetune InternVL2-8B with [MPO](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/#Mix-Preference-Optimization) using this dataset. The resulting model, [InternVL2-8B-MPO](https://huggingface.co/OpenGVLab/InternVL2-8B-MPO), achieves superior performance across 8 benchmarks, particularly excelling in multimodal reasoning tasks. **On the MathVista benchmark, our model achieves an accuracy of 67.0%**, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the \\(10\times\\) larger InternVL2-76B. **On the MathVision benchmark, our model achieves an accuracy of 25.7%**, establishing a new state-of-the-art performance among open-source models. These results demonstrate the effectiveness of our preference optimization approach in enhancing multimodal reasoning capabilities. Additionally, on the POPE benchmark, our model exhibits a 1.2-point improvement over InterVL2-8B, demonstrating the effectiveness of the perception data contained in our MMPR dataset to mitigate hallucinations. Furthermore, our model also shows superior performance compared to the InternVL2-8B on complex VQA benchmarks, indicating that the general abilities of our model are also improved, benefiting from enhanced reasoning abilities and mitigated hallucinations. Please refer to our [paper](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) for more details. | Model Name | M3CoT | MathVista | MathVision MINI | MMVet (GPT4-Turbo) | LLaVA-Bench | POPE | CRPE | MMHalBench | | ----------------------- | :---: | :-------: | :-------------: | :----------------: | :---------: | :---: | :---: | :--------: | | Gemini-1.5-Pro | - | 63.9 | 19.2 | - | - | - | - | - | | GPT-4o | 64.3 | 63.8 | 30.4 | 69.1 | 97.6 | 86.9 | 76.6 | 4.0 | | GPT-4o-Mini | 61.9 | 52.4 | 27.3 | 66.9 | 95.4 | 85.1 | 73.1 | 3.6 | | LLaVA-1.5-13B | 39.5 | 27.6 | 11.1 | 36.3 | 70.7 | 85.9 | 55.6 | 2.4 | | Qwen2-VL-7B | 57.8 | 58.2 | 21.1 | 60.6 | 67.7 | 88.1 | 74.4 | 3.4 | | MiniCPM-V-2-6-8B | 56.0 | 60.6 | 23.4 | 57.4 | 83.4 | 87.3 | 75.2 | 3.6 | | LLaVA-OneVision-7B | 52.3 | 63.2 | 18.4 | 51.4 | 79.9 | 88.4 | 73.7 | 3.1 | | InternVL2-26B | 58.2 | 59.4 | 23.4 | 62.1 | 92.3 | 88.0 | 75.6 | 3.7 | | InternVL2-40B | 63.6 | 63.7 | 21.4 | 65.5 | 100.5 | 88.4 | 77.3 | 3.9 | | InternVL2-76B | 65.4 | 67.5 | 23.7 | 65.7 | 99.3 | 89.0 | 77.8 | 3.8 | | InternVL2-Pro | 65.6 | 66.3 | 18.8 | 69.4 | 99.5 | 88.2 | 77.6 | 3.7 | | InternVL2-8B | 59.3 | 58.3 | 20.4 | 54.2 | 73.2 | 86.9 | 75.0 | 3.3 | | InternVL2-8B-MPO (ours) | 79.2 | 67.0 | 25.7 | 56.2 | 76.7 | 88.1 | 75.4 | 3.5 | Additionally, we finetune InternVL2.5 series with MPO using this dataset. The resulting models outperform their counterparts without MPO by an average of 2 points across all scales on the OpenCompass leaderboard. | Model | Avg. | MMBench v1.1 | MMStar | MMMU | MathVista | HallusionBench | AI2D | OCRBench | MMVet | | ------------------- | ---- | ------------ | ------ | ---- | --------- | -------------- | ---- | -------- | ----- | | InternVL2-5-1B | 54.9 | 66.5 | 51.3 | 41.2 | 47.1 | 39.4 | 69.0 | 77.4 | 47.2 | | InternVL2-5-1B-MPO | 56.4 | 67.2 | 49.7 | 40.8 | 53.0 | 40.0 | 69.4 | 83.6 | 47.2 | | InternVL2-5-2B | 59.9 | 70.9 | 54.3 | 43.2 | 51.1 | 42.3 | 74.9 | 80.2 | 62.6 | | InternVL2-5-2B-MPO | 62.0 | 71.6 | 55.0 | 45.0 | 56.4 | 43.0 | 75.3 | 84.2 | 65.4 | | InternVL2-5-4B | 65.1 | 78.2 | 58.7 | 51.8 | 60.8 | 46.6 | 81.4 | 82.0 | 61.5 | | InternVL2-5-4B-MPO | 67.6 | 78.6 | 60.2 | 51.6 | 65.3 | 47.8 | 82.0 | 88.0 | 67.1 | | InternVL2-5-8B | 68.9 | 82.5 | 63.2 | 56.2 | 64.5 | 49.0 | 84.6 | 82.1 | 62.8 | | InternVL2-5-8B-MPO | 70.4 | 82.4 | 65.7 | 54.9 | 68.9 | 51.4 | 84.5 | 88.3 | 66.9 | | InternVL2-5-26B | 71.6 | 84.6 | 66.5 | 60.7 | 68.0 | 55.8 | 86.2 | 85.4 | 65.4 | | InternVL2-5-26B-MPO | 72.7 | 84.2 | 67.2 | 57.7 | 72.8 | 55.3 | 86.2 | 91.2 | 67.1 | | InternVL2-5-38B | 73.5 | 85.4 | 68.5 | 64.6 | 72.4 | 57.9 | 87.6 | 84.1 | 67.2 | | InternVL2-5-38B-MPO | 75.5 | 85.6 | 69.8 | 64.1 | 73.8 | 61.5 | 88.1 | 88.5 | 72.5 | | InternVL2-5-78B | 75.2 | 87.5 | 69.5 | 70.0 | 70.6 | 57.4 | 89.1 | 85.3 | 71.8 | | InternVL2-5-78B-MPO | 76.6 | 87.3 | 73.1 | 68.3 | 73.8 | 58.7 | 89.3 | 91.2 | 71.4 | ## Usage Please refer to [our document](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html). ## Data fields | Key | Description | | ---------- | ----------------------------------- | | `image` | Image path. | | `question` | Input query. | | `chosen` | Chosen response for the question. | | `rejected` | Rejected response for the question. | ## Citation If you find this project useful in your research, please consider citing: ```BibTeX @article{wang2024mpo, title={Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization}, author={Wang, Weiyun and Chen, Zhe and Wang, Wenhai and Cao, Yue and Liu, Yangzhou and Gao, Zhangwei and Zhu, Jinguo and Zhu, Xizhou and Lu, Lewei and Qiao, Yu and Dai, Jifeng}, journal={arXiv preprint arXiv:2411.10442}, year={2024} } @article{chen2024expanding, title={Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling}, author={Chen, Zhe and Wang, Weiyun and Cao, Yue and Liu, Yangzhou and Gao, Zhangwei and Cui, Erfei and Zhu, Jinguo and Ye, Shenglong and Tian, Hao and Liu, Zhaoyang and others}, journal={arXiv preprint arXiv:2412.05271}, year={2024} } @article{chen2024far, title={How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites}, author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others}, journal={arXiv preprint arXiv:2404.16821}, year={2024} } @article{chen2023internvl, title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks}, author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng}, journal={arXiv preprint arXiv:2312.14238}, year={2023} } ```

# MMPR-v1.1 [📂 GitHub 仓库](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl2.0_mpo) [🗵 官方博客](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) [📜 学术论文](https://arxiv.org/abs/2411.10442) [📖 官方文档](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html) ***`2025/04/11:` 我们发布了MMPR的新版本(即[MMPR-v1.2](https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2)),该版本大幅提升了InternVL3的整体性能。*** <!-- ***这是[MMPR](https://huggingface.co/datasets/OpenGVLab/MMPR)的更新版本,新增了更多数据源以提升数据多样性,并且在OpenCompass排行榜的全规模档位上,将InternVL2.5的性能平均提升2个百分点。*** --> 若需解压图像压缩包,请先执行命令`cat images.zip_* > images.zip`,再执行`unzip images.zip`完成解压。 ## 数据集介绍 MMPR是一款大规模高质量的多模态推理偏好数据集,共包含约300万条样本。 ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/619507e7b74b6c591f794340/mmXL47UPDFwYOWdn9Z6j5.jpeg) ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/619507e7b74b6c591f794340/6fnvI_wCd9JXAs6vYthaG.jpeg) 我们使用该数据集结合[MPO(混合偏好优化,Mix Preference Optimization)](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/#Mix-Preference-Optimization)对InternVL2-8B进行微调。 由此得到的模型[InternVL2-8B-MPO](https://huggingface.co/OpenGVLab/InternVL2-8B-MPO)在8项基准测试中展现出优异性能,尤其在多模态推理任务上表现突出。 **在MathVista基准测试中,我们的模型准确率达到67.0%**,较InternVL2-8B提升8.7个百分点,性能可与参数规模大10倍的InternVL2-76B相媲美。 **在MathVision基准测试中,我们的模型准确率达到25.7%**,在开源模型中创下了新的最优性能纪录。 上述结果验证了我们的偏好优化方法在提升多模态推理能力上的有效性。此外,在POPE基准测试中,我们的模型较InternVL2-8B提升1.2个百分点,证明了MMPR数据集所包含的感知数据能够有效缓解模型幻觉问题。进一步地,在复杂视觉问答(VQA,Visual Question Answering)基准测试中,我们的模型同样优于InternVL2-8B,表明得益于推理能力的提升与幻觉问题的缓解,模型的通用能力也得到了增强。更多细节请参考我们的[学术论文](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/)。 | 模型名称 | M3CoT | MathVista | MathVision MINI | MMVet (GPT4-Turbo) | LLaVA-Bench | POPE | CRPE | MMHalBench | | ----------------------- | :---: | :-------: | :-------------: | :----------------: | :---------: | :---: | :---: | :--------: | | Gemini-1.5-Pro | - | 63.9 | 19.2 | - | - | - | - | - | | GPT-4o | 64.3 | 63.8 | 30.4 | 69.1 | 97.6 | 86.9 | 76.6 | 4.0 | | GPT-4o-Mini | 61.9 | 52.4 | 27.3 | 66.9 | 95.4 | 85.1 | 73.1 | 3.6 | | LLaVA-1.5-13B | 39.5 | 27.6 | 11.1 | 36.3 | 70.7 | 85.9 | 55.6 | 2.4 | | Qwen2-VL-7B | 57.8 | 58.2 | 21.1 | 60.6 | 67.7 | 88.1 | 74.4 | 3.4 | | MiniCPM-V-2-6-8B | 56.0 | 60.6 | 23.4 | 57.4 | 83.4 | 87.3 | 75.2 | 3.6 | | LLaVA-OneVision-7B | 52.3 | 63.2 | 18.4 | 51.4 | 79.9 | 88.4 | 73.7 | 3.1 | | InternVL2-26B | 58.2 | 59.4 | 23.4 | 62.1 | 92.3 | 88.0 | 75.6 | 3.7 | | InternVL2-40B | 63.6 | 63.7 | 21.4 | 65.5 | 100.5 | 88.4 | 77.3 | 3.9 | | InternVL2-76B | 65.4 | 67.5 | 23.7 | 65.7 | 99.3 | 89.0 | 77.8 | 3.8 | | InternVL2-Pro | 65.6 | 66.3 | 18.8 | 69.4 | 99.5 | 88.2 | 77.6 | 3.7 | | InternVL2-8B | 59.3 | 58.3 | 20.4 | 54.2 | 73.2 | 86.9 | 75.0 | 3.3 | | InternVL2-8B-MPO (ours) | 79.2 | 67.0 | 25.7 | 56.2 | 76.7 | 88.1 | 75.4 | 3.5 | 此外,我们使用该数据集结合MPO对InternVL2.5系列模型进行微调,所得模型在OpenCompass排行榜的全规模档位上,平均较未使用MPO的对应模型提升2个百分点。 | 模型 | 平均得分 | MMBench v1.1 | MMStar | MMMU | MathVista | HallusionBench | AI2D | OCRBench | MMVet | | ------------------- | ---- | ------------ | ------ | ---- | --------- | -------------- | ---- | -------- | ----- | | InternVL2-5-1B | 54.9 | 66.5 | 51.3 | 41.2 | 47.1 | 39.4 | 69.0 | 77.4 | 47.2 | | InternVL2-5-1B-MPO | 56.4 | 67.2 | 49.7 | 40.8 | 53.0 | 40.0 | 69.4 | 83.6 | 47.2 | | InternVL2-5-2B | 59.9 | 70.9 | 54.3 | 43.2 | 51.1 | 42.3 | 74.9 | 80.2 | 62.6 | | InternVL2-5-2B-MPO | 62.0 | 71.6 | 55.0 | 45.0 | 56.4 | 43.0 | 75.3 | 84.2 | 65.4 | | InternVL2-5-4B | 65.1 | 78.2 | 58.7 | 51.8 | 60.8 | 46.6 | 81.4 | 82.0 | 61.5 | | InternVL2-5-4B-MPO | 67.6 | 78.6 | 60.2 | 51.6 | 65.3 | 47.8 | 82.0 | 88.0 | 67.1 | | InternVL2-5-8B | 68.9 | 82.5 | 63.2 | 56.2 | 64.5 | 49.0 | 84.6 | 82.1 | 62.8 | | InternVL2-5-8B-MPO | 70.4 | 82.4 | 65.7 | 54.9 | 68.9 | 51.4 | 84.5 | 88.3 | 66.9 | | InternVL2-5-26B | 71.6 | 84.6 | 66.5 | 60.7 | 68.0 | 55.8 | 86.2 | 85.4 | 65.4 | | InternVL2-5-26B-MPO | 72.7 | 84.2 | 67.2 | 57.7 | 72.8 | 55.3 | 86.2 | 91.2 | 67.1 | | InternVL2-5-38B | 73.5 | 85.4 | 68.5 | 64.6 | 72.4 | 57.9 | 87.6 | 84.1 | 67.2 | | InternVL2-5-38B-MPO | 75.5 | 85.6 | 69.8 | 64.1 | 73.8 | 61.5 | 88.1 | 88.5 | 72.5 | | InternVL2-5-78B | 75.2 | 87.5 | 69.5 | 70.0 | 70.6 | 57.4 | 89.1 | 85.3 | 71.8 | | InternVL2-5-78B-MPO | 76.6 | 87.3 | 73.1 | 68.3 | 73.8 | 58.7 | 89.3 | 91.2 | 71.4 | ## 使用方法 请参考[我们的官方文档](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html)。 ## 数据字段 | 键名 | 描述 | | ---------- | ---------------------------- | | `image` | 图像路径。 | | `question` | 输入查询。 | | `chosen` | 该问题的优选回复。 | | `rejected` | 该问题的非优选回复。 | ## 引用声明 若您的研究中用到本项目,请考虑引用以下文献: BibTeX @article{wang2024mpo, title={Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization}, author={Wang, Weiyun and Chen, Zhe and Wang, Wenhai and Cao, Yue and Liu, Yangzhou and Gao, Zhangwei and Zhu, Jinguo and Zhu, Xizhou and Lu, Lewei and Qiao, Yu and Dai, Jifeng}, journal={arXiv preprint arXiv:2411.10442}, year={2024} } @article{chen2024expanding, title={Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling}, author={Chen, Zhe and Wang, Weiyun and Cao, Yue and Liu, Yangzhou and Gao, Zhangwei and Cui, Erfei and Zhu, Jinguo and Ye, Shenglong and Tian, Hao and Liu, Zhaoyang and others}, journal={arXiv preprint arXiv:2412.05271}, year={2024} } @article{chen2024far, title={How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites}, author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others}, journal={arXiv preprint arXiv:2404.16821}, year={2024} } @article{chen2023internvl, title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks}, author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng}, journal={arXiv preprint arXiv:2312.14238}, year={2023} }
提供机构:
maas
创建时间:
2024-12-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作