five

ReasonFlux-V2-SFT

收藏
魔搭社区2025-12-05 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/Gen-Verse/ReasonFlux-V2-SFT
下载链接
链接失效反馈
官方服务:
资源简介:
**ReasonFlux-V2** is our new template-augmented reasoning paradigm which **internalize the thought templates** through **iterative hierarchical reinforcement learning**. Specifically, we first develop an automated pipeline to extract thought templates from the problem–solution pairs in training set. To effectively internalize these high-level thought templates and learning a more efficient reasoning paradigm, we propose two collaborative modules: **Template Proposer** which adaptively proposes suitable thought templates based on the input problem; and **Template Reasoner**,which exactly instantiates the proposed templates and performs precise, detailed reasoning. Building upon these modules, we iteratively conduct **hierarchical RL** on optimizing both modules. ReasonFlux-v2 consists of two main modules: 1. **Template Proposer**, which **adaptively** proposes suitable high-level thought templates based on the input problem. It functions as intuitive thinking process of human which helps to **narrow the exploration space** of detailed reasoning process thus **improve the solution efficiency**. 2. **Template Reasoner**, which follow the proposed high-level thought template to efficiently and effectively solve the corresponding problem. **This dataset is the SFT dataset for Template Proposer,** the rest of the models and dataset are available below: [Template](Gen-Verse/ReasonFlux-V2-Template)|[SFT Dataset](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-SFT/) |[DPO Dataset (Proposer)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-DPO)|[DPO Dataset (Reasoner)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO) ## Citation ```bash @article{yang2025reasonflux, title={ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates}, author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi}, journal={arXiv preprint arXiv:2502.06772}, year={2025} } ```

**ReasonFlux-V2** 是我们提出的新型模板增强式推理范式,它通过**迭代分层强化学习(iterative hierarchical reinforcement learning)**实现思维模板的**内化**。具体而言,我们首先开发了一套自动化流水线,从训练集的问题-解对中提取思维模板。为有效内化这些高阶思维模板并学习更高效的推理范式,我们提出了两个协同模块:**模板提议器(Template Proposer)**,可基于输入问题自适应生成合适的思维模板;以及**模板推理器(Template Reasoner)**,其能精准实例化所提议的模板并开展严谨细致的推理。基于上述模块,我们通过迭代进行**分层强化学习**对两个模块进行优化。 ReasonFlux-v2 包含两个核心模块: 1. **模板提议器(Template Proposer)**:可基于输入问题自适应生成适配的高阶思维模板。其功能类似人类的直觉思考过程,能够**缩小详细推理过程的探索空间**,从而**提升解题效率**。 2. **模板推理器(Template Reasoner)**:遵循所提议的高阶思维模板,高效且精准地解决对应问题。 **本数据集为模板提议器的监督微调(Supervised Fine-Tuning,SFT)数据集**,其余模型与数据集可通过以下链接获取: [Template](Gen-Verse/ReasonFlux-V2-Template)|[SFT Dataset](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-SFT/) |[DPO Dataset (Proposer)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-DPO)|[DPO Dataset (Reasoner)](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-V2-Reasoner-DPO) ## 引用 bash @article{yang2025reasonflux, title={ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates}, author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi}, journal={arXiv preprint arXiv:2502.06772}, year={2025} }
提供机构:
maas
创建时间:
2025-05-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作