ReasonFlux-V2-Reasoner-DPO

Name: ReasonFlux-V2-Reasoner-DPO
Creator: maas
Published: 2025-12-05 16:40:34
License: 暂无描述

魔搭社区2025-12-05 更新2025-07-05 收录

下载链接：

https://modelscope.cn/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO

下载链接

链接失效反馈

官方服务：

资源简介：

**ReasonFlux-V2** is our new template-augmented reasoning paradigm which **internalize the thought templates** through **iterative hierarchical reinforcement learning**. Specifically, we first develop an automated pipeline to extract thought templates from the problem–solution pairs in training set. To effectively internalize these high-level thought templates and learning a more efficient reasoning paradigm, we propose two collaborative modules: **Template Proposer** which adaptively proposes suitable thought templates based on the input problem; and **Template Reasoner**,which exactly instantiates the proposed templates and performs precise, detailed reasoning. Building upon these modules, we iteratively conduct **hierarchical RL** on optimizing both modules. ReasonFlux-v2 consists of two main modules: 1. **Template Proposer**, which **adaptively** proposes suitable high-level thought templates based on the input problem. It functions as intuitive thinking process of human which helps to **narrow the exploration space** of detailed reasoning process thus **improve the solution efficiency**. 2. **Template Reasoner**, which follow the proposed high-level thought template to efficiently and effectively solve the corresponding problem. **This dataset is the DPO dataset for Template Reasoner,** the rest of the models and dataset are available below: [Code](https://github.com/Gen-Verse/ReasonFlux)|[Template](Gen-Verse/ReasonFlux-V2-Template)|[SFT Dataset](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-SFT/) |[DPO Dataset (Proposer)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-DPO)|[DPO Dataset (Reasoner)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO) ## Citation ```bash @article{yang2025reasonflux, title={ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates}, author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi}, journal={arXiv preprint arXiv:2502.06772}, year={2025} } ```

**ReasonFlux-V2**是我们提出的新型模板增强推理范式，它通过**迭代层级强化学习（iterative hierarchical reinforcement learning）**实现思维模板的**内化**。具体而言，我们首先构建了一套自动化流水线，从训练集的问题-解答对应对中提取思维模板。为有效内化这些高阶思维模板并学习更高效的推理范式，我们提出了两个协同模块：**模板提议器（Template Proposer）**，可基于输入问题自适应生成适配的思维模板；以及**模板推理器（Template Reasoner）**，能够精准实例化所提议的模板并开展严谨细致的推理。基于上述模块，我们通过迭代开展**层级强化学习（hierarchical RL）**来优化两个模块。 ReasonFlux-v2包含两个核心模块： 1. **模板提议器**：可基于输入问题自适应生成适配的高阶思维模板。其类比人类的直觉思考过程，能够缩小详细推理过程的探索空间，从而提升解题效率。 2. **模板推理器**：遵循所提议的高阶思维模板，高效且精准地解决对应问题。 **本数据集为面向模板推理器的DPO数据集**，其余模型与数据集的获取方式如下： [代码](https://github.com/Gen-Verse/ReasonFlux)|[模板库](Gen-Verse/ReasonFlux-V2-Template)|[SFT数据集](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-SFT/) |[DPO数据集（提议器）](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-DPO)|[DPO数据集（推理器）](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO) ## 引用 bash @article{yang2025reasonflux, title={ReasonFlux：通过缩放思维模板实现层级式大语言模型（Large Language Model，LLM）推理}, author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi}, journal={arXiv预印本 arXiv:2502.06772}, year={2025} }

提供机构：

maas

创建时间：

2025-07-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集