ReasonFlux-V2-Reasoner-DPO
收藏魔搭社区2025-12-05 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO
下载链接
链接失效反馈官方服务:
资源简介:
**ReasonFlux-V2** is our new template-augmented reasoning paradigm which **internalize the thought templates** through **iterative hierarchical reinforcement learning**. Specifically, we first develop an automated pipeline to extract thought templates from the problem–solution pairs in training set. To effectively internalize these high-level thought templates and learning a more efficient reasoning paradigm, we propose two collaborative modules: **Template Proposer** which adaptively proposes suitable thought templates based on the input problem; and **Template Reasoner**,which exactly instantiates the proposed templates and performs precise, detailed reasoning. Building upon these modules, we iteratively conduct **hierarchical RL** on optimizing both modules.
ReasonFlux-v2 consists of two main modules:
1. **Template Proposer**, which **adaptively** proposes suitable high-level thought templates based on the input problem. It functions as intuitive thinking process of human which helps to **narrow the exploration space** of detailed reasoning process thus **improve the solution efficiency**.
2. **Template Reasoner**, which follow the proposed high-level thought template to efficiently and effectively solve the corresponding problem.
**This dataset is the DPO dataset for Template Reasoner,** the rest of the models and dataset are available below:
[Code](https://github.com/Gen-Verse/ReasonFlux)|[Template](Gen-Verse/ReasonFlux-V2-Template)|[SFT Dataset](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-SFT/) |[DPO Dataset (Proposer)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-DPO)|[DPO Dataset (Reasoner)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO)
## Citation
```bash
@article{yang2025reasonflux,
title={ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates},
author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi},
journal={arXiv preprint arXiv:2502.06772},
year={2025}
}
```
**ReasonFlux-V2**是我们提出的新型模板增强推理范式,它通过**迭代层级强化学习(iterative hierarchical reinforcement learning)**实现思维模板的**内化**。具体而言,我们首先构建了一套自动化流水线,从训练集的问题-解答对应对中提取思维模板。为有效内化这些高阶思维模板并学习更高效的推理范式,我们提出了两个协同模块:**模板提议器(Template Proposer)**,可基于输入问题自适应生成适配的思维模板;以及**模板推理器(Template Reasoner)**,能够精准实例化所提议的模板并开展严谨细致的推理。基于上述模块,我们通过迭代开展**层级强化学习(hierarchical RL)**来优化两个模块。
ReasonFlux-v2包含两个核心模块:
1. **模板提议器**:可基于输入问题自适应生成适配的高阶思维模板。其类比人类的直觉思考过程,能够缩小详细推理过程的探索空间,从而提升解题效率。
2. **模板推理器**:遵循所提议的高阶思维模板,高效且精准地解决对应问题。
**本数据集为面向模板推理器的DPO数据集**,其余模型与数据集的获取方式如下:
[代码](https://github.com/Gen-Verse/ReasonFlux)|[模板库](Gen-Verse/ReasonFlux-V2-Template)|[SFT数据集](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-SFT/) |[DPO数据集(提议器)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-DPO)|[DPO数据集(推理器)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO)
## 引用
bash
@article{yang2025reasonflux,
title={ReasonFlux:通过缩放思维模板实现层级式大语言模型(Large Language Model,LLM)推理},
author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi},
journal={arXiv预印本 arXiv:2502.06772},
year={2025}
}
提供机构:
maas
创建时间:
2025-07-04



