spiral-rl/Spiral-Kuhn-Poker-Qwen3-32B-SFT

Name: spiral-rl/Spiral-Kuhn-Poker-Qwen3-32B-SFT
Creator: spiral-rl
Published: 2025-07-05 07:35:14
License: 暂无描述

Hugging Face2025-07-05 更新2025-10-25 收录

下载链接：

https://hf-mirror.com/datasets/spiral-rl/Spiral-Kuhn-Poker-Qwen3-32B-SFT

下载链接

链接失效反馈

官方服务：

资源简介：

这个专家数据集是通过在Kuhn Poker上使用Qwen3-32B进行自我玩耍并保持获胜轨迹收集的。它是SPIRAL项目的一部分，该项目是一个自我玩耍框架，模型通过与自己不断改进的版本进行多回合、零和游戏来学习，从而生成一个无限的学习进度，逐步提出更具挑战性的问题。这个数据集提供了从自我玩耍过程中得到的高质量轨迹，对于开发语言模型的可迁移推理能力非常重要。

This expert dataset is collected by keeping the winning trajectories of self-play using Qwen3-32B on Kuhn Poker. It is part of the SPIRAL project, which is a self-play framework where models learn by playing multi-turn, zero-sum games against continuously improving versions of themselves, thereby generating an infinite curriculum of progressively challenging problems. This dataset provides high-quality trajectories derived from this process, which are crucial for developing transferable reasoning capabilities in language models.

提供机构：

spiral-rl

5,000+

优质数据集

54 个

任务类型

进入经典数据集