ReasonFlux-V2-SFT

Name: ReasonFlux-V2-SFT
Creator: maas
Published: 2025-12-05 16:36:04
License: 暂无描述

魔搭社区2025-12-05 更新2025-07-05 收录

下载链接：

https://modelscope.cn/datasets/Gen-Verse/ReasonFlux-V2-SFT

下载链接

链接失效反馈

官方服务：

资源简介：

**ReasonFlux-V2** is our new template-augmented reasoning paradigm which **internalize the thought templates** through **iterative hierarchical reinforcement learning**. Specifically, we first develop an automated pipeline to extract thought templates from the problem–solution pairs in training set. To effectively internalize these high-level thought templates and learning a more efficient reasoning paradigm, we propose two collaborative modules: **Template Proposer** which adaptively proposes suitable thought templates based on the input problem; and **Template Reasoner**,which exactly instantiates the proposed templates and performs precise, detailed reasoning. Building upon these modules, we iteratively conduct **hierarchical RL** on optimizing both modules. ReasonFlux-v2 consists of two main modules: 1. **Template Proposer**, which **adaptively** proposes suitable high-level thought templates based on the input problem. It functions as intuitive thinking process of human which helps to **narrow the exploration space** of detailed reasoning process thus **improve the solution efficiency**. 2. **Template Reasoner**, which follow the proposed high-level thought template to efficiently and effectively solve the corresponding problem. **This dataset is the SFT dataset for Template Proposer,** the rest of the models and dataset are available below: [Template](Gen-Verse/ReasonFlux-V2-Template)|[SFT Dataset](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-SFT/) |[DPO Dataset (Proposer)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-DPO)|[DPO Dataset (Reasoner)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO) ## Citation ```bash @article{yang2025reasonflux, title={ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates}, author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi}, journal={arXiv preprint arXiv:2502.06772}, year={2025} } ```

**ReasonFlux-V2** 是我们提出的新型模板增强式推理范式，它通过**迭代分层强化学习（iterative hierarchical reinforcement learning）**实现思维模板的**内化**。具体而言，我们首先开发了一套自动化流水线，从训练集的问题-解对中提取思维模板。为有效内化这些高阶思维模板并学习更高效的推理范式，我们提出了两个协同模块：**模板提议器（Template Proposer）**，可基于输入问题自适应生成合适的思维模板；以及**模板推理器（Template Reasoner）**，其能精准实例化所提议的模板并开展严谨细致的推理。基于上述模块，我们通过迭代进行**分层强化学习**对两个模块进行优化。 ReasonFlux-v2 包含两个核心模块： 1. **模板提议器（Template Proposer）**：可基于输入问题自适应生成适配的高阶思维模板。其功能类似人类的直觉思考过程，能够**缩小详细推理过程的探索空间**，从而**提升解题效率**。 2. **模板推理器（Template Reasoner）**：遵循所提议的高阶思维模板，高效且精准地解决对应问题。 **本数据集为模板提议器的监督微调（Supervised Fine-Tuning，SFT）数据集**，其余模型与数据集可通过以下链接获取： [Template](Gen-Verse/ReasonFlux-V2-Template)|[SFT Dataset](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-SFT/) |[DPO Dataset (Proposer)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-DPO)|[DPO Dataset (Reasoner)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO) ## 引用 bash @article{yang2025reasonflux, title={ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates}, author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi}, journal={arXiv preprint arXiv:2502.06772}, year={2025} }

提供机构：

maas

创建时间：

2025-05-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集