ReasonFlux-V2-SFT
收藏魔搭社区2025-12-05 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/Gen-Verse/ReasonFlux-V2-SFT
下载链接
链接失效反馈官方服务:
资源简介:
**ReasonFlux-V2** is our new template-augmented reasoning paradigm which **internalize the thought templates** through **iterative hierarchical reinforcement learning**. Specifically, we first develop an automated pipeline to extract thought templates from the problem–solution pairs in training set. To effectively internalize these high-level thought templates and learning a more efficient reasoning paradigm, we propose two collaborative modules: **Template Proposer** which adaptively proposes suitable thought templates based on the input problem; and **Template Reasoner**,which exactly instantiates the proposed templates and performs precise, detailed reasoning. Building upon these modules, we iteratively conduct **hierarchical RL** on optimizing both modules.
ReasonFlux-v2 consists of two main modules:
1. **Template Proposer**, which **adaptively** proposes suitable high-level thought templates based on the input problem. It functions as intuitive thinking process of human which helps to **narrow the exploration space** of detailed reasoning process thus **improve the solution efficiency**.
2. **Template Reasoner**, which follow the proposed high-level thought template to efficiently and effectively solve the corresponding problem.
**This dataset is the SFT dataset for Template Proposer,** the rest of the models and dataset are available below:
[Template](Gen-Verse/ReasonFlux-V2-Template)|[SFT Dataset](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-SFT/) |[DPO Dataset (Proposer)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-DPO)|[DPO Dataset (Reasoner)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO)
## Citation
```bash
@article{yang2025reasonflux,
title={ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates},
author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi},
journal={arXiv preprint arXiv:2502.06772},
year={2025}
}
```
**ReasonFlux-V2** 是我们提出的新型模板增强式推理范式,它通过**迭代分层强化学习(iterative hierarchical reinforcement learning)**实现思维模板的**内化**。具体而言,我们首先开发了一套自动化流水线,从训练集的问题-解对中提取思维模板。为有效内化这些高阶思维模板并学习更高效的推理范式,我们提出了两个协同模块:**模板提议器(Template Proposer)**,可基于输入问题自适应生成合适的思维模板;以及**模板推理器(Template Reasoner)**,其能精准实例化所提议的模板并开展严谨细致的推理。基于上述模块,我们通过迭代进行**分层强化学习**对两个模块进行优化。
ReasonFlux-v2 包含两个核心模块:
1. **模板提议器(Template Proposer)**:可基于输入问题自适应生成适配的高阶思维模板。其功能类似人类的直觉思考过程,能够**缩小详细推理过程的探索空间**,从而**提升解题效率**。
2. **模板推理器(Template Reasoner)**:遵循所提议的高阶思维模板,高效且精准地解决对应问题。
**本数据集为模板提议器的监督微调(Supervised Fine-Tuning,SFT)数据集**,其余模型与数据集可通过以下链接获取:
[Template](Gen-Verse/ReasonFlux-V2-Template)|[SFT Dataset](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-SFT/) |[DPO Dataset (Proposer)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-DPO)|[DPO Dataset (Reasoner)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO)
## 引用
bash
@article{yang2025reasonflux,
title={ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates},
author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi},
journal={arXiv preprint arXiv:2502.06772},
year={2025}
}
提供机构:
maas
创建时间:
2025-05-26



