TreePO_data
收藏魔搭社区2025-12-10 更新2025-09-06 收录
下载链接:
https://modelscope.cn/datasets/m-a-p/TreePO_data
下载链接
链接失效反馈官方服务:
资源简介:
We release the resources for the paper [TreePO](arxiv.org/abs/2508.17445):
- Checkpoint with average weighted subgroup advantages + more diverse intial divergence ([the final one](https://huggingface.co/m-a-p/TreePO-Qwen2.5-7B)).
- Checkpoint with average weighted subgroup advantages + [fixed divergence](https://huggingface.co/m-a-p/TreePO-Qwen2.5-7B_fixed-div).
- The [training dataset](https://huggingface.co/datasets/m-a-p/TreePO_data) consisted of deepscaler and simplerl math reasoning. **← You are here.**
More links:
- [Huggingface Paper](https://huggingface.co/papers/2508.17445)
- [Project Page](https://m-a-p.ai/TreePO)
- [X/Twitter Thread](https://x.com/yizhilll/status/1960616873180954854)
- [Github Repo](https://github.com/multimodal-art-projection/TreePO)
If you find this work useful, please consider citing the paper:
```bibtex
@misc{li2025treepo, title={TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling}, author={Yizhi Li and Qingshui Gu and Zhoufutu Wen and Ziniu Li and Tianshun Xing and Shuyue Guo and Tianyu Zheng and Xin Zhou and Xingwei Qu and Wangchunshu Zhou and Zheng Zhang and Wei Shen and Qian Liu and Chenghua Lin and Jian Yang and Ge Zhang and Wenhao Huang}, year={2025}, eprint={2508.17445}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2508.17445}, howpublished = {\url{https://m-a-p.ai/TreePO}} }
```
我们发布了论文[TreePO](arxiv.org/abs/2508.17445)对应的全套研究资源:
- 搭载平均加权子群优势与更多样化初始散度的模型检查点([最终版本](https://huggingface.co/m-a-p/TreePO-Qwen2.5-7B))。
- 搭载平均加权子群优势与固定散度的模型检查点([固定散度版本](https://huggingface.co/m-a-p/TreePO-Qwen2.5-7B_fixed-div))。
- 训练数据集([TreePO_data](https://huggingface.co/datasets/m-a-p/TreePO_data))由deepscaler与simplerl数学推理数据集构成。**← 您当前位于此处。**
更多相关链接:
- [Huggingface 论文页面](https://huggingface.co/papers/2508.17445)
- [项目主页](https://m-a-p.ai/TreePO)
- [X(原Twitter)话题线程](https://x.com/yizhilll/status/1960616873180954854)
- [GitHub 代码仓库](https://github.com/multimodal-art-projection/TreePO)
若您认为本研究对您有所帮助,请引用该论文:
bibtex
@misc{li2025treepo, title={TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling}, author={Yizhi Li and Qingshui Gu and Zhoufutu Wen and Ziniu Li and Tianshun Xing and Shuyue Guo and Tianyu Zheng and Xin Zhou and Xingwei Qu and Wangchunshu Zhou and Zheng Zhang and Wei Shen and Qian Liu and Chenghua Lin and Jian Yang and Ge Zhang and Wenhao Huang}, year={2025}, eprint={2508.17445}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2508.17445}, howpublished = {url{https://m-a-p.ai/TreePO}} }
提供机构:
maas
创建时间:
2025-08-28



