VisualProcessBench|多模态推理数据集|模型评估数据集

下载链接：

https://modelscope.cn/datasets/OpenGVLab/VisualProcessBench

下载链接

链接失效反馈

官方服务：

资源简介：

# VisualProcessBench [\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL) [\[📜 Paper\]](https://arxiv.org/abs/2503.10291) [\[🆕 Blog\]](https://internvl.github.io/blog/2025-03-13-VisualPRM/) [\[🤗 model\]](https://huggingface.co/OpenGVLab/VisualPRM-8B) [\[🤗 dataset\]](https://huggingface.co/datasets/OpenGVLab/VisualPRM400K) [\[🤗 benchmark\]](https://huggingface.co/datasets/OpenGVLab/VisualProcessBench) VisualProcessBench is a benchmark designed to measure the abilities of PRMs and MLLMs to identify erroneous steps in multimodal reasoning tasks. This benchmark comprises 2,866 samples with a total of 26,950 human-annotated step-wise correctness labels. ## Data fields - Data fields for each sample: | Key | Description | | -------------- | ------------------------------------------------------------------------------------------ | | `image` | List of Image path. | | `question` | Input query. | | `answer` | Ground Truth to this question. | | `response` | The model-generated response to this question, which has been splited into multiple steps. | | `policy_model` | The model used to generate the response. | | `data_source` | The source of this question. | - Data fields for each response: | Key | Description | | --------------------- | -------------------------------------------------------------------------------------------------- | | `steps` | Steps of this response. | | `process_correctness` | Correctness annotation of each step. 1, 0, -1 denotes correct, neural, and incorrect, respectively | ## Data Examples ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/example-1.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/mmmu-1.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/mmmu-2.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/mmmu-3.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/mathverse-1.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/mathverse-2.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/mathverse-3.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/DynaMath-1.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/DynaMath-2.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/DynaMath-3.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/mathvision-1.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/mathvision-2.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/mathvision-3.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/wemath-1.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/wemath-2.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/benchmark-examples/wemath-3.png?raw=true) ## License This project is released under the MIT License. This project uses the pre-trained internlm2_5-7b-chat as a component, which is licensed under the Apache License 2.0. ## Citation If you find this project useful in your research, please consider citing: ```BibTeX @article{wang2025visualprm, title={VisualPRM: An Effective Process Reward Model for Multimodal Reasoning}, author={Wang, Weiyun and Gao, Zhangwei and Chen, Lianjie and Chen, Zhe and Zhu, Jinguo and Zhao, Xiangyu and Liu, Yangzhou and Cao, Yue and Ye, Shenglong and Zhu, Xizhou and others}, journal={arXiv preprint arXiv:2503.10291}, year={2025} } ```

应用场景：