下载链接：

https://modelscope.cn/datasets/OpenGVLab/VisualPRM400K-v1.1-Raw

下载链接

链接失效反馈

官方服务：

资源简介：

# VisualPRM400K-v1.1 [\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL) [\[📜 Paper\]](https://arxiv.org/abs/2503.10291) [\[🆕 Blog\]](https://internvl.github.io/blog/2025-03-13-VisualPRM/) [\[🤗 model\]](https://huggingface.co/OpenGVLab/VisualPRM-8B) [\[🤗 dataset\]](https://huggingface.co/datasets/OpenGVLab/VisualPRM400K-v1.1) [\[🤗 benchmark\]](https://huggingface.co/datasets/OpenGVLab/VisualProcessBench) ***NOTE: VisualPRM400K-v1.1 is a new version of VisualPRM400K, which is used to train [VisualPRM-8B-v1.1](https://huggingface.co/OpenGVLab/VisualPRM-8B-v1.1). Compared to the original version, v1.1 includes additional data sources and prompts during rollout sampling to enhance data diversity.*** VisualPRM400K is a dataset comprising approximately 400K multimodal process supervision data. We generate the data using an automatic data pipeline. The key idea is to estimate the expected accuracy \\(mc_i\\) of the given step \\(s_{\leq i}\\) based on Monte Carlo sampling and consider the step correct if \\(mc_i>0\\). Please see our [paper](https://arxiv.org/abs/2503.10291) or [blog](https://internvl.github.io/blog/2025-03-13-VisualPRM/) for more details. NOTE: If you want to use the annotations, which have been formulated as multi-turn conversation, please refer to [this version](https://huggingface.co/datasets/OpenGVLab/VisualPRM400K-v1.1). ## Data Examples ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/example-1.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/ocr.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/document.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/math.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/science.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/general.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/chart.png?raw=true) ## Data fields - Data fields for each sample: | Key | Description | | ------------------ | ---------------------------------------------------------------------- | | `image` | Image path. | | `question` | Input query. | | `answer` | Ground Truth for the question. | | `response` | Sampled response for the question. | | `steps_with_score` | The split steps for the response. | | `num_mc_sequences` | The number of continuations sampled to estimate the expected accuracy. | - Data fields for each response: | Key | Description | | ---------------- | ---------------------------------------------------------------------- | | `step` | The content of the step. | | `score` | The expected accuracy of the step. | | `num_mc_correct` | The number of correct continuations. | | `num_mc_total` | The number of continuations sampled to estimate the expected accuracy. | ## License This project is released under the MIT License. This project uses the pre-trained internlm2_5-7b-chat as a component, which is licensed under the Apache License 2.0. ## Citation If you find this project useful in your research, please consider citing: ```BibTeX @article{wang2025visualprm, title={VisualPRM: An Effective Process Reward Model for Multimodal Reasoning}, author={Wang, Weiyun and Gao, Zhangwei and Chen, Lianjie and Chen, Zhe and Zhu, Jinguo and Zhao, Xiangyu and Liu, Yangzhou and Cao, Yue and Ye, Shenglong and Zhu, Xizhou and others}, journal={arXiv preprint arXiv:2503.10291}, year={2025} } ```

# VisualPRM400K-v1.1 [📂 GitHub仓库](https://github.com/OpenGVLab/InternVL) [📜 论文](https://arxiv.org/abs/2503.10291) [🆕 博客](https://internvl.github.io/blog/2025-03-13-VisualPRM/) [🤗 模型](https://huggingface.co/OpenGVLab/VisualPRM-8B) [🤗 数据集](https://huggingface.co/datasets/OpenGVLab/VisualPRM400K-v1.1) [🤗 基准测试集](https://huggingface.co/datasets/OpenGVLab/VisualProcessBench) ***注意：VisualPRM400K-v1.1 是 VisualPRM400K 的更新版本，用于训练 [VisualPRM-8B-v1.1](https://huggingface.co/OpenGVLab/VisualPRM-8B-v1.1)。相较于原始版本，v1.1 新增了滚动采样阶段的数据源与提示词，以提升数据多样性。*** VisualPRM400K 是一个包含约40万条多模态过程监督数据的数据集，我们通过自动化数据流水线生成该数据集。其核心思路是基于蒙特卡洛采样估算给定步骤 \(s_{\leq i}\) 的期望准确率 \(mc_i\)，若 \(mc_i>0\) 则判定该步骤正确。更多细节可参阅我们的[论文](https://arxiv.org/abs/2503.10291)或[博客](https://internvl.github.io/blog/2025-03-13-VisualPRM/)。注意：若需使用已构建为多轮对话格式的标注数据，请参阅[此版本](https://huggingface.co/datasets/OpenGVLab/VisualPRM400K-v1.1)。 ## 数据示例 ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/example-1.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/ocr.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/document.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/math.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/science.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/general.png?raw=true) ![image/png](https://github.com/InternVL/InternVL.github.io/blob/main/blog/2025-03-13-VisualPRM/images/data-examples/chart.png?raw=true) ## 数据字段 - 单条样本的数据字段： | 键名 | 说明 | | ------------------ | ---------------------------------------------------------------------- | | `image` | 图像路径。 | | `question` | 输入查询内容。 | | `answer` | 问题的标准答案。 | | `response` | 针对该问题的采样回复。 | | `steps_with_score` | 回复拆分后的步骤集合。 | | `num_mc_sequences` | 用于估算期望准确率的续接采样序列总数。 | - 单条回复的数据字段： | 键名 | 说明 | | ---------------- | ---------------------------------------------------------------------- | | `step` | 步骤的具体内容。 | | `score` | 该步骤的期望准确率。 | | `num_mc_correct` | 判定为正确的续接采样序列数量。 | | `num_mc_total` | 用于估算期望准确率的总续接采样序列数。 | ## 许可证本项目采用 MIT 许可证发布。本项目使用了预训练模型 internlm2_5-7b-chat 作为组件，该组件采用 Apache 许可证2.0授权。 ## 引用如果您在研究中使用了本项目，请引用以下文献： BibTeX @article{wang2025visualprm, title={VisualPRM: An Effective Process Reward Model for Multimodal Reasoning}, author={Wang, Weiyun and Gao, Zhangwei and Chen, Lianjie and Chen, Zhe and Zhu, Jinguo and Zhao, Xiangyu and Liu, Yangzhou and Cao, Yue and Ye, Shenglong and Zhu, Xizhou and others}, journal={arXiv preprint arXiv:2503.10291}, year={2025} }

应用场景：