zeyuzy/puzzle-bench
收藏Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/zeyuzy/puzzle-bench
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: maze_10x10
data_files:
- split: train
path: maze_10x10/train-*
- split: test
path: maze_10x10/test-*
- config_name: maze_15x15
data_files:
- split: train
path: maze_15x15/train-*
- split: test
path: maze_15x15/test-*
- config_name: maze_5x5
data_files:
- split: train
path: maze_5x5/train-*
- split: test
path: maze_5x5/test-*
- config_name: maze_7x7
data_files:
- split: train
path: maze_7x7/train-*
- split: test
path: maze_7x7/test-*
- config_name: sudoku_4x4
data_files:
- split: train
path: sudoku_4x4/train-*
- split: test
path: sudoku_4x4/test-*
- config_name: sudoku_9x9
data_files:
- split: train
path: sudoku_9x9/train-*
- split: test
path: sudoku_9x9/test-*
license: mit
task_categories:
- text-generation
tags:
- sudoku
- maze
- puzzle
- constraint-satisfaction
- benchmark
language:
- en
size_categories:
- 10K<n<100K
dataset_info:
- config_name: sudoku_4x4
features:
- name: puzzle
dtype: string
- name: solution
dtype: string
- name: empty_count
dtype: int64
- name: source
dtype: string
splits:
- name: train
num_bytes: 592000
num_examples: 8000
- name: test
num_bytes: 148000
num_examples: 2000
download_size: 136367
dataset_size: 740000
- config_name: sudoku_9x9
features:
- name: puzzle
dtype: string
- name: solution
dtype: string
- name: empty_count
dtype: int64
- name: steps_count
dtype: int64
- name: backtrack_count
dtype: int64
- name: max_depth
dtype: int64
- name: source
dtype: string
- name: difficulty
dtype: string
splits:
- name: train
num_bytes: 9939525
num_examples: 41784
- name: test
num_bytes: 2485478
num_examples: 10448
download_size: 6173960
dataset_size: 12425003
---
# Puzzle Bench
Difficulty-labeled evaluation datasets for **Sudoku** and **Maze** tasks, designed for benchmarking language models on combinatorial reasoning.
**GitHub:** [zeyuzhangzyz/puzzle-bench](https://github.com/zeyuzhangzyz/puzzle-bench)
## Dataset Overview
| Config | Total | Train | Test | Difficulty Labels |
|--------|-------|-------|------|-------------------|
| `sudoku_4x4` | 10,000 | 8,000 | 2,000 | -- |
| `sudoku_9x9` | 52,806 | 42,244 | 10,562 | easy / medium / hard |
| `maze_5x5` | 10,000 | 8,000 | 2,000 | -- |
| `maze_7x7` | 10,000 | 8,000 | 2,000 | -- |
| `maze_10x10` | 10,000 | 8,000 | 2,000 | -- |
| `maze_15x15` | 30,000 | 24,000 | 6,000 | easy / medium / hard |
## Sudoku
### sudoku_4x4
4x4 Sudoku puzzles generated via backtracking with unique-solution verification.
| Column | Description |
|--------|-------------|
| `puzzle` | 16-character string (0 = empty cell) |
| `solution` | 16-character solution |
| `empty_count` | Number of blank cells |
| `source` | Generator identifier |
### sudoku_9x9
9x9 Sudoku puzzles with solver-computed difficulty metrics. Mixed from multiple sources for balanced difficulty distribution.
| Column | Description |
|--------|-------------|
| `puzzle` | 81-character string (0 = empty cell) |
| `solution` | 81-character solution |
| `empty_count` | Number of blank cells |
| `steps_count` | Solver step count (MRV + backtracking) |
| `backtrack_count` | Number of backtracks |
| `max_depth` | Maximum recursion depth |
| `difficulty` | easy / medium / hard |
| `source` | Source dataset identifier |
| Difficulty | Count | Criterion |
|------------|-------|-----------|
| easy | 29,842 | 0 backtracks (pure logic) |
| medium | 10,000 | 1-1,000 backtracks |
| hard | 12,964 | 1,000+ backtracks |
## Maze
Mazes encoded as binary wall strings with BFS-computed path metrics. Algorithms used: dfs, wilson, prim, kruskal, rdiv.
| Column | Description |
|--------|-------------|
| `maze` | Binary string encoding walls |
| `start` | Start coordinates `row,col` |
| `goal` | Goal coordinates `row,col` |
| `grid_size` | Grid dimension (5/7/10/15) |
| `algorithm` | Generation algorithm |
| `solution_length` | BFS shortest path length |
| `bfs_nodes` | BFS nodes expanded |
| `source` | Generator identifier |
| `difficulty` | easy / medium / hard (maze_15x15 only, by solution_length tercile) |
### maze_15x15 difficulty breakdown
| Difficulty | Train | Test | Total |
|------------|-------|------|-------|
| easy | 8,000 | 2,000 | 10,000 |
| medium | 8,000 | 2,000 | 10,000 |
| hard | 8,000 | 2,000 | 10,000 |
## Usage
```python
from datasets import load_dataset
# 4x4 Sudoku
sudoku_4x4 = load_dataset("zeyuzy/puzzle-bench", "sudoku_4x4")
# 9x9 Sudoku, hard difficulty only
sudoku_9x9 = load_dataset("zeyuzy/puzzle-bench", "sudoku_9x9")
hard = sudoku_9x9["test"].filter(lambda x: x["difficulty"] == "hard")
# Maze with train/test split
maze_5x5 = load_dataset("zeyuzy/puzzle-bench", "maze_5x5")
# Maze 15x15 with difficulty labels
maze_15x15 = load_dataset("zeyuzy/puzzle-bench", "maze_15x15")
hard_maze = maze_15x15["test"].filter(lambda x: x["difficulty"] == "hard")
```
## Citation
```bibtex
@software{puzzle-bench,
title={Puzzle Bench},
author={Zhang, Zeyu},
url={https://github.com/zeyuzhangzyz/puzzle-bench},
year={2026}
}
```
提供机构:
zeyuzy



