public_long_form_thought_data_5k
收藏STILL: Slow Thinking with LLMs
数据集
- 训练数据: 部分训练数据已开源,文件名为
public_long_form_thought_data_5k.jsonl,位于data/目录下。
模型
- 模型: 已开源模型
STILL-2,可在 Hugging Face 上获取。
技术报告
-
Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems:
- 报告地址: arXiv:2412.09413
- 内容概述: 介绍了实现类似 o1 的慢思考推理系统的复现报告,采用模仿、探索和自我改进的框架进行模型训练。
-
Enhancing LLM Reasoning with Reward-guided Tree Search:
- 报告地址: arXiv:2411.11694
- 内容概述: 探讨了通过奖励引导的树搜索算法增强 LLM 推理能力的方法。
未来工作
- 计划研究如何扩展训练方法的容量,以应对更复杂的任务。
引用
-
如果技术报告对研究有帮助,请引用以下文献:
@article{Slow_Thinking_with_LLMs_1, title={Enhancing LLM Reasoning with Reward-guided Tree Search}, author={Jiang, Jinhao and Chen, Zhipeng and Min, Yingqian and Chen, Jie and Cheng, Xiaoxue and Wang, Jiapeng and Tang, Yiru and Sun, Haoxiang and Deng, Jia and Zhao, Wayne Xin and Liu, Zheng and Yan, Dong and Xie, Jian and Wang, Zhongyuan and Wen, Ji-Rong}, journal={arXiv preprint arXiv:2411.11694}, year={2024} }
@article{Slow_Thinking_with_LLMs_2, title={Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems}, author={Min, Yingqian and Chen, Zhipeng and Jiang, Jinhao and Chen, Jie and Deng, Jia and Hu, Yiwen and Tang, Yiru and Wang, Jiapeng and Cheng, Xiaoxue and Song, Huatong and Zhao, Wayne Xin and Liu, Zheng and Wang, Zhongyuan and Wen, Ji-Rong}, journal={arXiv preprint arXiv:2412.09413}, year={2024} }




