five

stablegradients/maze-17x17-500k-shortest

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/stablegradients/maze-17x17-500k-shortest
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en tags: - maze - reasoning - reinforcement-learning - sft size_categories: - 100K<n<1M configs: - config_name: binary data_files: - split: train path: train_binary.parquet - split: test path: test_binary.parquet - config_name: continuous data_files: - split: train path: train_continuous.parquet - split: test path: test_continuous.parquet - config_name: distance data_files: - split: train path: train_distance.parquet - split: test path: test_distance.parquet --- # Maze 17x17 500k — Shortest Path Supervised fine-tuning corpus of 17x17 mazes where the target trajectory is the unique shortest path from START to GOAL. Mazes are generated with Prim's algorithm so there is a single solution path. ## Splits and configs - **train**: 450,000 examples - **test**: 512 examples Three reward configurations are provided — they share the same prompts but differ in the reward-model metadata used by downstream RL: | config | reward signal | |--------------|-------------------------------------------| | `binary` | 1 if the path reaches the goal, else 0 | | `continuous` | fractional progress toward the goal | | `distance` | goal reached + solution-quality component | ## Schema Each row has columns: `data_source`, `prompt`, `ability`, `reward_model`, `extra_info`. `prompt` is a chat-formatted list of messages whose user turn contains the maze grid in a tokenized form (`WALL`, `PATH`, `START`, `GOAL`, `NEWLINE`). The target trajectory lives in `extra_info.answer` and `reward_model.ground_truth`. ## Usage ```python from datasets import load_dataset ds = load_dataset("stablegradients/maze-17x17-500k-shortest", "binary") print(ds["train"][0]) ```
提供机构:
stablegradients
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作