AntimLabs/FlappyBird-SFT
收藏Hugging Face2025-12-03 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/AntimLabs/FlappyBird-SFT
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- reinforcement-learning
- text-generation
tags:
- flappy-bird
- reinforcement-learning
- game-ai
- sft
---
# FlappyBird SFT Dataset
This dataset contains supervised fine-tuning (SFT) data for Flappy Bird game control policy.
## Dataset Details
- **Source**: Evaluation results from `AntimLabs/Qwen2.5-1.5B-Instruct-FlappyBird-RL-71`
- **Format**: Standard SFT format with `prompt` and `completion` columns
- **Examples**: 710 game episodes
- **Total conversation turns**: ~147k
## Data Format
Each example has two columns:
- **prompt**: List of messages containing system prompt and initial user observation
- **completion**: List of messages containing the full game conversation (user observations and assistant actions)
```python
{
"prompt": [
{"role": "system", "content": "You are the Flappy Bird control policy..."},
{"role": "user", "content": "<FLAPPY id=1>..."}
],
"completion": [
{"role": "assistant", "content": "<ACTIONS>[]</ACTIONS>"},
{"role": "user", "content": "<FLAPPY id=2>..."},
{"role": "assistant", "content": "<ACTIONS>[TAP]</ACTIONS>"},
...
]
}
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("AntimLabs/FlappyBird-SFT", split="train")
```
license: MIT协议
task_categories:
- 强化学习(Reinforcement Learning)
- 文本生成
tags:
- Flappy Bird
- 强化学习(Reinforcement Learning)
- 游戏AI(Game AI)
- 监督微调(Supervised Fine-Tuning,SFT)
# Flappy Bird 监督微调数据集
本数据集包含用于Flappy Bird游戏控制策略的监督微调(Supervised Fine-Tuning,SFT)数据。
## 数据集详情
- **数据来源**:来自`AntimLabs/Qwen2.5-1.5B-Instruct-FlappyBird-RL-71`的评估结果
- **数据格式**:采用标准监督微调格式,包含`prompt`与`completion`两列
- **样本数量**:710局游戏对局
- **总对话轮次**:约14.7万
## 数据格式
每个样本包含两列:
- **prompt**:包含系统提示与初始用户观测信息的消息列表
- **completion**:包含完整游戏对话(用户观测与智能体动作)的消息列表
python
{
"prompt": [
{"role": "system", "content": "You are the Flappy Bird control policy..."},
{"role": "user", "content": "<FLAPPY id=1>..."}
],
"completion": [
{"role": "assistant", "content": "<ACTIONS>[]</ACTIONS>"},
{"role": "user", "content": "<FLAPPY id=2>..."},
{"role": "assistant", "content": "<ACTIONS>[TAP]</ACTIONS>"},
...
]
}
## 使用方法
python
from datasets import load_dataset
dataset = load_dataset("AntimLabs/FlappyBird-SFT", split="train")
提供机构:
AntimLabs



