spiral-rl/Spiral-Kuhn-Poker-Qwen3-32B-SFT
收藏Hugging Face2025-07-05 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/spiral-rl/Spiral-Kuhn-Poker-Qwen3-32B-SFT
下载链接
链接失效反馈官方服务:
资源简介:
这个专家数据集是通过在Kuhn Poker上使用Qwen3-32B进行自我玩耍并保持获胜轨迹收集的。它是SPIRAL项目的一部分,该项目是一个自我玩耍框架,模型通过与自己不断改进的版本进行多回合、零和游戏来学习,从而生成一个无限的学习进度,逐步提出更具挑战性的问题。这个数据集提供了从自我玩耍过程中得到的高质量轨迹,对于开发语言模型的可迁移推理能力非常重要。
This expert dataset is collected by keeping the winning trajectories of self-play using Qwen3-32B on Kuhn Poker. It is part of the SPIRAL project, which is a self-play framework where models learn by playing multi-turn, zero-sum games against continuously improving versions of themselves, thereby generating an infinite curriculum of progressively challenging problems. This dataset provides high-quality trajectories derived from this process, which are crucial for developing transferable reasoning capabilities in language models.
提供机构:
spiral-rl



