PaCoRe-Train-8k
收藏魔搭社区2026-01-08 更新2025-12-13 收录
下载链接:
https://modelscope.cn/datasets/stepfun-ai/PaCoRe-Train-8k
下载链接
链接失效反馈官方服务:
资源简介:
# PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
<div align="center">
[**Read the Paper**](https://github.com/stepfun-ai/PaCoRe/blob/main/pacore_report.pdf) | [**Download Models**](https://huggingface.co/stepfun-ai/PaCoRe-8B) | [**Training Data**](https://huggingface.co/datasets/stepfun-ai/PaCoRe-Train-8k)
</div>
## 📖 Overview
We introduce **PaCoRe (Parallel Coordinated Reasoning)**, a framework that shifts the driver of inference from sequential depth to **coordinated parallel breadth**, breaking the model context limitation and massively scaling test time compute:
* **Think in Parallel:** PaCoRe launches massive parallel exploration trajectories.
* **Coordinate in Multi-rounds:** It employs a message-passing architecture to compact these thoughts into concise messages and synthesize them to guide the next round.
Trained via large-scale, outcome-based reinforcement learning, PaCoRe masters the **Reasoning Synthesis** capabilities required to reconcile diverse parallel insights.
The approach yields strong improvements across diverse domains, and notably pushes reasoning beyond frontier systems in mathematics: an 8B model reaches 94.5\% on HMMT 2025, surpassing GPT-5’s 93.2\% by scaling effective TTC to roughly two million tokens.
We open-source model checkpoints, training data, and the full inference pipeline to accelerate follow-up work!
------
<p align="center">
<img src="figure/teaser_draft_02.png" width="48%" />
<img src="figure/before_after_train_lcb_02.png" width="48%" />
</p>
*Figure 1 | Parallel Coordinated Reasoning (PaCoRe) performance. Left: On HMMT 2025, PaCoRe-8B demonstrates remarkable test-time scaling, yielding steady gains and ultimately surpassing GPT-5. Right: On LiveCodeBench, the RLVR-8B model fails to leverage increased test-time compute, while PaCoRe-8B model effectively unlocks substantial gains as the test-time compute increases.*
<p align="center">
<img src="figure/train_reward_response_length_1130.png" width="48%" />
<img src="figure/benchmark_accuracy_1130.png" width="48%" />
</p>
*Figure 2 | PaCoRe Training dynamics. Left panels: The Training Reward and Response Length steadily increase, demonstrating the training stability and effectiveness. Right panels: Evaluation on HMMT 2025 and LiveCodeBench (2408-2505). Performance is reported using single round coordinated reasoning in PaCoRe inference setting with $\vec{K} = [16]$.*
## 🔥 Releases
**[2025/12/09]** We are excited to release the **PaCoRe-8B** ecosystem:
* 📝 **In-depth Technical Report:** [**PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning.**](https://github.com/stepfun-ai/PaCoRe/blob/main/pacore_report.pdf)
* 🤖 **Model:**
* [PaCoRe-8B](https://huggingface.co/stepfun-ai/PaCoRe-8B): Our final PaCoRe-trained model checkpoint!
* [RLVR-8B-0926](https://huggingface.co/stepfun-ai/RLVR-8B-0926): The initial checkpoint of our study, conducted strong reasoning-oriented post-trained on [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base).
* 📚 **Data:** [PaCoRe-Train-8k](https://huggingface.co/datasets/stepfun-ai/PaCoRe-Train-8k) The high-quality training corpus, including `opensource_math`, `public_mathcontest`, `synthetic_math` and `code`:
* 🤗 Stage1-3k: [PaCoRe-Train-Stage1-3k](https://huggingface.co/datasets/stepfun-ai/PaCoRe-Train-8k/stage1)
* 🤗 Stage2-5k: [PaCoRe-Train-Stage2-5k](https://huggingface.co/datasets/stepfun-ai/PaCoRe-Train-8k/stage2)
## 🔍 Experiments
<table class="tg">
<thead>
<tr>
<th class="tg-header"></th>
<th class="tg-data">HMMT 2025</th>
<th class="tg-data">LiveCodeBench</th>
<th class="tg-data">HLE<sub>text</sub></th>
<th class="tg-data">MultiChallenge</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-header">GPT-5</td>
<td class="tg-data">93.2 (16k)</td>
<td class="tg-data"><b>83.5</b> (13k)</td>
<td class="tg-data"><b>26.0</b> (14k)</td>
<td class="tg-data"><b>71.1</b> (5.0k)</td>
</tr>
<tr>
<td class="tg-header">Qwen3-235B-Thinking</td>
<td class="tg-data">82.3 (32k)</td>
<td class="tg-data">74.5 (21k)</td>
<td class="tg-data">18.2 (23k)</td>
<td class="tg-data">60.3 (1.6k)</td>
</tr>
<tr>
<td class="tg-header">GLM-4.6</td>
<td class="tg-data">88.7 (25k)</td>
<td class="tg-data">79.5 (19k)</td>
<td class="tg-data">17.2 (21k)</td>
<td class="tg-data">54.9 (2.2k)</td>
</tr>
<tr>
<td class="tg-header">DeepSeek-v3.1-Terminus</td>
<td class="tg-data">86.1 (20k)</td>
<td class="tg-data">74.9 (11k)</td>
<td class="tg-data">19.3 (18k)</td>
<td class="tg-data">54.4 (1.1k)</td>
</tr>
<tr class="tg-midrule">
<td class="tg-header">Kimi-K2-Thinking</td>
<td class="tg-data">86.5 (33k)</td>
<td class="tg-data">79.2 (25k)</td>
<td class="tg-data">23.9 (29k)</td>
<td class="tg-data">66.4 (1.7k)</td>
</tr>
<tr class="tg-midrule">
<td class="tg-header">RLVR-8B</td>
<td class="tg-data">75.4 (48k)</td>
<td class="tg-data">70.6 (34k)</td>
<td class="tg-data">9.3 (35k)</td>
<td class="tg-data">33.3 (1.7k)</td>
</tr>
<tr>
<td class="tg-header"><b>PaCoRe-8B (low)</b></td>
<td class="tg-data">88.2 (243k)</td>
<td class="tg-data">75.8 (188k)</td>
<td class="tg-data">13.0 (196k)</td>
<td class="tg-data">41.8 (13k)</td>
</tr>
<tr>
<td class="tg-header"><b>PaCoRe-8B (medium)</b></td>
<td class="tg-data">92.9 (869k)</td>
<td class="tg-data">76.7 (659k)</td>
<td class="tg-data">14.6 (694k)</td>
<td class="tg-data">45.7 (45k)</td>
</tr>
<tr class="tg-bottom">
<td class="tg-header"><b>PaCoRe-8B (high)</b></td>
<td class="tg-data"><b>94.5</b> (1796k)</td>
<td class="tg-data">78.2 (1391k)</td>
<td class="tg-data">16.2 (1451k)</td>
<td class="tg-data">47.0 (95k)</td>
</tr>
</tbody>
</table>
*Table 1 | For each benchmark, we report accuracy together with total TTC (in thousands). For *Low*, *Medium*, and *High*, we apply the inference trajectory configuration as $\vec{K}=[4]$, $[16]$, and $[32, 4]$ separately.*
### Key Findings
* **Message Passing Unlocks Scaling.** Without compaction, performance flatlines at the context limit. PaCoRe breaks the memory barrier and lets reasoning scale freely.
* **Breadth > Depth.** All compute is not equal. Coordinated parallel reasoning delivers far higher returns than extending a single chain.
* **Data as a Force Multiplier.** The PaCoRe corpus provides exceptionally valuable supervision—even baseline models see substantial gains when trained on it.
## Getting Started 🚀
### Data
The data is provided as a `list[dict]`, where each entry represents a training instance:
* `conversation`: The original problem/prompt messages.
* `responses`: A list of cached generated responses (trajectories). These serve as the **input messages ($M$)** used during PaCoRe training.
* `ground_truth`: The verifiable answer used for correctness evaluation.
### Model Serving
You can directly use `vllm serve` to serve the model! More inference details of PaCoRe will be handled in Inference Pipeline.
### Inference Pipeline

*Figure 3 | Inference pipeline of PaCoRe. Each round launches broad parallel exploration, compacts the resulting trajectories into compacted messages, and feeds these messages together with the question forward to coordinate the next round. Repeating this process $\hat{R}$ times yields multi-million-token effective TTC while respecting fixed context limits, with the final compacted message serving as the system’s answer.*
Inference code coming soon!
## 🙏 Acknowledgements
- This work was supported by computing resources and infrastructure provided by [StepFun](https://www.stepfun.com/) and Tsinghua University.
- We are deeply grateful to our colleagues for their support:
* Inference: Song Yuan, Wuxun Xie, Mingliang Li, Bojun Wang.
* Training: Xing Chen, Yuanwei Lu, Changyi Wan, Yu Zhou.
* Infra Operations: Shaoliang Pang, Changxin Miao, Xu Zhao, Wei Zhang, Zidong Yang, Junzhe Lin, Yuxiang Yang, Chen Xu, Xin Li, Bin Wang.
* Data Management: Xiaoxiao Ren, Zhiguo Huang, and Kang An.
* Helpful Discussions: Liang Zhao, Jianjian Sun, Zejia Weng, JingJing Xie.
- We are grateful for colleagues from StepFun and Tsinghua University for their valuable feedback and contributions.
- Our work is built on amazing open source models and data; thanks again!
## 🔮 Future Work
We are just scratching the surface of parallel coordinated reasoning. Our roadmap includes:
- **Scaling the Extremes**: We plan to apply PaCoRe to stronger foundation models, expanding the task domains, and further scaling up both the breadth (parallel trajectories) and depth (coordination rounds) to tackle challenges currently deemed unsolvable.
- **Boosting Token Intelligence Density**: While we currently scale by volume, we aim to maximize the utility of every unit of compute spent. This involves enabling more efficient parallel exploration through better organization, cooperation, and division of labor among trajectories.
- **Emergent Multi-Agent Intelligence**: We are interested in exploring the joint training of both the synthesis policy and the message-passing mechanism, laying minimal yet rich cooperative multi-agent learning environment, offering a valuable playground for studying emergent communication, self-organization, and collective intelligence.
- **Ouroboros for Pre- and Post-Training**: we intend to investigate the development of advanced synthetic data generation techniques with PaCoRe pipeline to improve both current pretraining and post-training processes.
## Advertisement Time 📣
We are currently seeking self-motivated engineers and reseachers.
If you are interested in our project and would like to contribute to the reasoner scale-up all the way to AGI, please feel free to reach out to us at hanqer@stepfun.com
## 📜 Citation
```bibtex
@misc{pacore2025,
title={PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning},
author={Jingcheng Hu and Yinmin Zhang and Shijie Shang and Xiaobo Yang and Yue Peng and Zhewei Huang and Hebin Zhou and Xin Wu and Jie Cheng and Fanqi Wan and Xiangwen Kong and Chengyuan Yao and Ailin Huang and Hongyu Zhou and Qi Han and Zheng Ge and Daxin Jiang and Xiangyu Zhang and Heung-Yeung Shum},
year={2025},
url={[https://github.com/stepfun-ai/PaCoRe/blob/main/pacore_report.pdf](https://github.com/stepfun-ai/PaCoRe/blob/main/pacore_report.pdf)},
}
```
# PaCoRe:基于并行协同推理的测试时计算缩放学习
<div align="center">
[**阅读论文**](https://github.com/stepfun-ai/PaCoRe/blob/main/pacore_report.pdf) | [**下载模型**](https://huggingface.co/stepfun-ai/PaCoRe-8B) | [**训练数据**](https://huggingface.co/datasets/stepfun-ai/PaCoRe-Train-8k)
</div>
## 📖 概览
我们提出**PaCoRe(并行协同推理,Parallel Coordinated Reasoning)**框架,将推理的驱动逻辑从串行深度拓展转向**协同并行广度**,突破模型上下文限制,大幅拓展测试时计算(Test-Time Compute, TTC)的规模:
* **并行思考**:PaCoRe 启动大规模并行探索轨迹。
* **多轮协同**:采用消息传递架构将各类思考压缩为简洁消息,并通过合成这些消息指导下一轮推理。
PaCoRe 通过大规模基于结果的强化学习进行训练,掌握了协调各类并行推理结果所需的**推理合成(Reasoning Synthesis)**能力。
该方法在多个领域均取得显著性能提升,尤其在数学推理任务上超越了当前前沿系统:8B 参数量的 PaCoRe 模型在 HMMT 2025 数据集上达到 94.5% 的准确率,通过将有效测试时计算规模拓展至约 200 万 Token,超越了 GPT-5 的 93.2%。
我们开源了模型检查点、训练数据与完整推理流水线,以推动后续相关研究!
------
<p align="center">
<img src="figure/teaser_draft_02.png" width="48%" />
<img src="figure/before_after_train_lcb_02.png" width="48%" />
</p>
*图1 | 并行协同推理(PaCoRe)性能表现。左图:在 HMMT 2025 数据集上,PaCoRe-8B 展现出出色的测试时计算缩放能力,性能稳步提升并最终超越 GPT-5。右图:在 LiveCodeBench 基准上,RLVR-8B 模型无法利用增加的测试时计算资源,而 PaCoRe-8B 模型则可随着测试时计算规模提升获得显著性能增益。*
<p align="center">
<img src="figure/train_reward_response_length_1130.png" width="48%" />
<img src="figure/benchmark_accuracy_1130.png" width="48%" />
</p>
*图2 | PaCoRe 训练动态。左图:训练奖励与响应长度稳步提升,证明训练过程稳定且有效。右图:在 HMMT 2025 与 LiveCodeBench(2408-2505)基准上的评估结果。评估采用 PaCoRe 推理设置下的单轮协同推理,配置为 $vec{K} = [16]$。*
## 🔥 最新发布
**[2025/12/09]** 我们正式发布 PaCoRe-8B 生态系统:
* 📝 **深度技术报告**:[**PaCoRe:基于并行协同推理的测试时计算缩放学习**](https://github.com/stepfun-ai/PaCoRe/blob/main/pacore_report.pdf)
* 🤖 **模型**:
* [PaCoRe-8B](https://huggingface.co/stepfun-ai/PaCoRe-8B):我们最终的 PaCoRe 训练模型检查点!
* [RLVR-8B-0926](https://huggingface.co/stepfun-ai/RLVR-8B-0926):本研究的初始检查点,基于 [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base) 进行了强推理导向的后训练。
* 📚 **数据**:[PaCoRe-Train-8k](https://huggingface.co/datasets/stepfun-ai/PaCoRe-Train-8k) 高质量训练语料,包含 `opensource_math`、`public_mathcontest`、`synthetic_math` 与 `code`:
* 🤗 Stage1-3k:[PaCoRe-Train-Stage1-3k](https://huggingface.co/datasets/stepfun-ai/PaCoRe-Train-8k/stage1)
* 🤗 Stage2-5k:[PaCoRe-Train-Stage2-5k](https://huggingface.co/datasets/stepfun-ai/PaCoRe-Train-8k/stage2)
## 🔍 实验
<table class="tg">
<thead>
<tr>
<th class="tg-header"></th>
<th class="tg-data">HMMT 2025</th>
<th class="tg-data">LiveCodeBench</th>
<th class="tg-data">HLE<sub>text</sub></th>
<th class="tg-data">MultiChallenge</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-header">GPT-5</td>
<td class="tg-data">93.2 (16k)</td>
<td class="tg-data"><b>83.5</b> (13k)</td>
<td class="tg-data"><b>26.0</b> (14k)</td>
<td class="tg-data"><b>71.1</b> (5.0k)</td>
</tr>
<tr>
<td class="tg-header">Qwen3-235B-思考版</td>
<td class="tg-data">82.3 (32k)</td>
<td class="tg-data">74.5 (21k)</td>
<td class="tg-data">18.2 (23k)</td>
<td class="tg-data">60.3 (1.6k)</td>
</tr>
<tr>
<td class="tg-header">GLM-4.6</td>
<td class="tg-data">88.7 (25k)</td>
<td class="tg-data">79.5 (19k)</td>
<td class="tg-data">17.2 (21k)</td>
<td class="tg-data">54.9 (2.2k)</td>
</tr>
<tr>
<td class="tg-header">DeepSeek-v3.1-Terminus</td>
<td class="tg-data">86.1 (20k)</td>
<td class="tg-data">74.9 (11k)</td>
<td class="tg-data">19.3 (18k)</td>
<td class="tg-data">54.4 (1.1k)</td>
</tr>
<tr class="tg-midrule">
<td class="tg-header">Kimi-K2-思考版</td>
<td class="tg-data">86.5 (33k)</td>
<td class="tg-data">79.2 (25k)</td>
<td class="tg-data">23.9 (29k)</td>
<td class="tg-data">66.4 (1.7k)</td>
</tr>
<tr class="tg-midrule">
<td class="tg-header">RLVR-8B</td>
<td class="tg-data">75.4 (48k)</td>
<td class="tg-data">70.6 (34k)</td>
<td class="tg-data">9.3 (35k)</td>
<td class="tg-data">33.3 (1.7k)</td>
</tr>
<tr>
<td class="tg-header"><b>PaCoRe-8B(低配置)</b></td>
<td class="tg-data">88.2 (243k)</td>
<td class="tg-data">75.8 (188k)</td>
<td class="tg-data">13.0 (196k)</td>
<td class="tg-data">41.8 (13k)</td>
</tr>
<tr>
<td class="tg-header"><b>PaCoRe-8B(中配置)</b></td>
<td class="tg-data">92.9 (869k)</td>
<td class="tg-data">76.7 (659k)</td>
<td class="tg-data">14.6 (694k)</td>
<td class="tg-data">45.7 (45k)</td>
</tr>
<tr class="tg-bottom">
<td class="tg-header"><b>PaCoRe-8B(高配置)</b></td>
<td class="tg-data"><b>94.5</b> (1796k)</td>
<td class="tg-data">78.2 (1391k)</td>
<td class="tg-data">16.2 (1451k)</td>
<td class="tg-data">47.0 (95k)</td>
</tr>
</tbody>
</table>
*表1 | 每个基准任务均报告准确率及总测试时计算量(单位:千 Token)。其中*低配置*、*中配置*、*高配置*分别采用推理轨迹配置 $vec{K}=[4]$、$[16]$ 与 $[32, 4]$。*
### 核心发现
* **消息传递解锁缩放能力**:若不进行信息压缩,模型性能会在上下文限制下陷入停滞。PaCoRe 突破了内存壁垒,实现推理规模的自由拓展。
* **广度优于深度**:并非所有计算资源都能带来同等收益。协同并行推理相比单一推理链拓展,能带来更高的性能回报。
* **数据作为效能倍增器**:PaCoRe 训练语料提供了极具价值的监督信号——即使是基线模型,在该语料上微调后也能获得显著性能提升。
## 快速开始 🚀
### 数据
训练数据以`list[dict]`格式提供,每个条目对应一个训练实例:
* `conversation`:原始问题/提示对话内容
* `responses`:缓存的生成响应列表(探索轨迹),作为 PaCoRe 训练过程中的**输入消息($M$)**
* `ground_truth`:用于正确性评估的可验证标准答案
### 模型部署
可直接使用 `vllm serve` 命令部署该模型!PaCoRe 的更多推理细节将通过推理流水线进行处理。
### 推理流水线

*图3 | PaCoRe 推理流水线。每一轮推理都会启动大规模并行探索,将生成的轨迹压缩为紧凑消息,并将这些消息与问题一同输入以指导下一轮推理。重复该过程 $hat{R}$ 次后,可在固定上下文限制下实现数百万 Token 级的有效测试时计算,最终以压缩后的消息作为模型输出答案。*
推理代码即将上线!
## 🙏 致谢
- 本工作得到了[阶跃星辰(StepFun)](https://www.stepfun.com/)与清华大学提供的计算资源与基础设施支持。
- 我们衷心感谢团队同事的鼎力支持:
* 推理相关:宋远、谢武勋、李明亮、王博俊
* 训练相关:陈星、陆元伟、万昌颐、周宇
* 基础设施运维:庞绍亮、苗昌鑫、赵旭、张伟、杨子东、林俊哲、杨宇翔、徐晨、李鑫、王斌
* 数据管理:任潇潇、黄志国、安康
* 有益讨论:赵亮、孙建健、翁泽嘉、谢晶晶
- 感谢阶跃星辰与清华大学的同事们提供的宝贵反馈与贡献。
- 本工作基于诸多优秀的开源模型与数据构建,在此一并致谢!
## 🔮 未来工作
我们仍处于并行协同推理研究的起步阶段,后续研究路线包括:
- **拓展极限边界**:计划将 PaCoRe 应用于更强的基础模型,拓展任务领域,并进一步拓展并行轨迹的广度与协同轮次的深度,以攻克当前被认为难以解决的挑战。
- **提升 Token 智能密度**:当前我们通过扩大计算量实现性能提升,未来将致力于最大化每单位计算资源的效用,包括通过更高效的轨迹组织、协作与分工,实现更优质的并行探索。
- **涌现式多智能体智能**:我们计划探索合成策略与消息传递机制的联合训练,构建极简且丰富的协作多智能体学习环境,为研究涌现式通信、自组织与集体智能提供优质实验平台。
- **预训练与微调的衔尾蛇循环**:拟基于 PaCoRe 流水线开发先进的合成数据生成技术,以优化当前的预训练与微调流程。
## 📣 招聘启事
我们正在招募富有主动性的工程师与研究员。
如果您对我们的项目感兴趣,希望参与推理系统的规模拓展直至通用人工智能(AGI)目标,请通过邮箱 hanqer@stepfun.com 联系我们。
## 📜 引用格式
bibtex
@misc{pacore2025,
title={PaCoRe: 基于并行协同推理的测试时计算缩放学习},
author={胡景程、张银敏、尚世杰、杨晓波、彭越、黄哲伟、周合斌、吴鑫、程杰、万凡琪、孔祥文、姚承远、黄爱林、周宏宇、韩琪、葛铮、姜大昕、张祥宇、沈向洋},
year={2025},
url={https://github.com/stepfun-ai/PaCoRe/blob/main/pacore_report.pdf},
}
提供机构:
maas
创建时间:
2025-12-10



