stablegradients/maze-17x17-500k-general
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/stablegradients/maze-17x17-500k-general
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- maze
- reasoning
- reinforcement-learning
- sft
size_categories:
- 100K<n<1M
configs:
- config_name: binary
data_files:
- split: train
path: train_binary.parquet
- split: test
path: test_binary.parquet
- config_name: continuous
data_files:
- split: train
path: train_continuous.parquet
- split: test
path: test_continuous.parquet
- config_name: distance
data_files:
- split: train
path: train_distance.parquet
- split: test
path: test_distance.parquet
---
# Maze 17x17 500k — General Paths
Supervised fine-tuning corpus of 17x17 mazes with the target being a valid (not necessarily shortest) path from START to GOAL. Mazes are generated with Prim's algorithm.
## Splits and configs
- **train**: 450,000 examples
- **test**: 512 examples
Three reward configurations are provided — they share the same prompts but
differ in the reward-model metadata used by downstream RL:
| config | reward signal |
|--------------|-------------------------------------------|
| `binary` | 1 if the path reaches the goal, else 0 |
| `continuous` | fractional progress toward the goal |
| `distance` | goal reached + solution-quality component |
## Schema
Each row has columns: `data_source`, `prompt`, `ability`, `reward_model`,
`extra_info`. `prompt` is a chat-formatted list of messages whose user turn
contains the maze grid in a tokenized form (`WALL`, `PATH`, `START`, `GOAL`,
`NEWLINE`). The target trajectory lives in `extra_info.answer` and
`reward_model.ground_truth`.
## Usage
```python
from datasets import load_dataset
ds = load_dataset("stablegradients/maze-17x17-500k-general", "binary")
print(ds["train"][0])
```
提供机构:
stablegradients



