stablegradients/maze-17x17-500k-general

Name: stablegradients/maze-17x17-500k-general
Creator: stablegradients
Published: 2026-04-20 18:14:12
License: 暂无描述

Hugging Face2026-04-20 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/stablegradients/maze-17x17-500k-general

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation language: - en tags: - maze - reasoning - reinforcement-learning - sft size_categories: - 100K<n<1M configs: - config_name: binary data_files: - split: train path: train_binary.parquet - split: test path: test_binary.parquet - config_name: continuous data_files: - split: train path: train_continuous.parquet - split: test path: test_continuous.parquet - config_name: distance data_files: - split: train path: train_distance.parquet - split: test path: test_distance.parquet --- # Maze 17x17 500k — General Paths Supervised fine-tuning corpus of 17x17 mazes with the target being a valid (not necessarily shortest) path from START to GOAL. Mazes are generated with Prim's algorithm. ## Splits and configs - **train**: 450,000 examples - **test**: 512 examples Three reward configurations are provided — they share the same prompts but differ in the reward-model metadata used by downstream RL: | config | reward signal | |--------------|-------------------------------------------| | `binary` | 1 if the path reaches the goal, else 0 | | `continuous` | fractional progress toward the goal | | `distance` | goal reached + solution-quality component | ## Schema Each row has columns: `data_source`, `prompt`, `ability`, `reward_model`, `extra_info`. `prompt` is a chat-formatted list of messages whose user turn contains the maze grid in a tokenized form (`WALL`, `PATH`, `START`, `GOAL`, `NEWLINE`). The target trajectory lives in `extra_info.answer` and `reward_model.ground_truth`. ## Usage ```python from datasets import load_dataset ds = load_dataset("stablegradients/maze-17x17-500k-general", "binary") print(ds["train"][0]) ```

提供机构：

stablegradients

5,000+

优质数据集

54 个

任务类型

进入经典数据集