ReTool-SFT
收藏魔搭社区2025-12-11 更新2025-05-03 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/ReTool-SFT
下载链接
链接失效反馈官方服务:
资源简介:
# ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
In this work, we embrace the RL paradigm and introduce ReTool, a Tool-augmented Reinforcement learning framework explicitly designed to guide LLMs towards optimal strategies for leveraging external computational tools during reasoning. Our comprehensive experiments on AIME2024 and AIME2025 demonstrate that ReTool not only achieves superior accuracy compared to conventional text-based RL approaches, but also converges with significantly fewer training steps.
🚀 ReTool achieves accuracy of 67.0% on AIME 2024 and 49.3% on AIME 2025 based on the Qwen2.5-32B-Instruct model, outperforming the text-based RL baseline with less than 50% training steps.
- Project Page: https://retool-rl.github.io/
### Citation
If you find our project helpful, please cite:
```
@misc{feng2025retoolreinforcementlearningstrategic,
title={ReTool: Reinforcement Learning for Strategic Tool Use in LLMs},
author={Jiazhan Feng and Shijue Huang and Xingwei Qu and Ge Zhang and Yujia Qin and Baoquan Zhong and Chengquan Jiang and Jinxin Chi and Wanjun Zhong},
year={2025},
eprint={2504.11536},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.11536},
}
```
# ReTool:面向大语言模型(Large Language Models,LLMs)策略性工具使用的强化学习框架
本研究采用强化学习(Reinforcement Learning,RL)范式,提出ReTool——一款专为引导大语言模型(LLMs)在推理过程中最优利用外部计算工具而设计的工具增强型强化学习框架。我们在AIME2024与AIME2025数据集上开展了全面实验,结果表明ReTool不仅相较传统基于文本的强化学习方法实现了更优的准确率,同时仅需显著更少的训练步数即可收敛。
🚀 基于Qwen2.5-32B-Instruct模型,ReTool在AIME 2024与AIME 2025上的准确率分别达到67.0%与49.3%,仅使用不足50%的训练步数便超越了基于文本的强化学习基线模型。
- 项目主页:https://retool-rl.github.io/
### 引用
若您认为本项目对您的研究有所帮助,请引用如下文献:
@misc{feng2025retoolreinforcementlearningstrategic,
title={ReTool: Reinforcement Learning for Strategic Tool Use in LLMs},
author={Jiazhan Feng and Shijue Huang and Xingwei Qu and Ge Zhang and Yujia Qin and Baoquan Zhong and Chengquan Jiang and Jinxin Chi and Wanjun Zhong},
year={2025},
eprint={2504.11536},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.11536},
}
提供机构:
maas
创建时间:
2025-04-26



