Mayfull/sudoku

Name: Mayfull/sudoku
Creator: Mayfull
Published: 2026-03-14 12:47:16
License: 暂无描述

Hugging Face2026-03-14 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/Mayfull/sudoku

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: puzzle dtype: string - name: solution dtype: string splits: - name: train num_examples: 1000000 - name: test num_examples: 500 license: mit task_categories: - text-generation tags: - sudoku - constraint-satisfaction - planning - reasoning - diffusion-language-model - reinforcement-learning size_categories: - 1M<n<10M --- # 4x4 Sudoku Dataset Standard benchmark dataset for evaluating reasoning capabilities of diffusion language models (dLLMs). This is the dataset used in the following papers: - [**d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning**](https://arxiv.org/abs/2504.12216) (Zhao et al., 2025) - [**d2: Improved Techniques for Training Reasoning Diffusion Language Models**](https://arxiv.org/abs/2509.21474) (Wang et al., 2026) - [**SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models**](https://arxiv.org/abs/2510.09541) (Facebook Research, 2025) ## Dataset Description 4x4 Sudoku puzzles represented as 16-character strings, where `0` denotes an empty cell and digits `1-4` denote filled cells. | Split | Examples | |-------|----------| | Train | 1,000,000 | | Test | 500 | ### Data Format Each example contains two fields: - **`puzzle`**: 16-character string representing the puzzle (`0` = empty cell) - **`solution`**: 16-character string representing the completed grid ``` puzzle: 0010000402400421 solution: 4312213412433421 ``` Every 4 characters form one row of the 4x4 grid: ``` Puzzle: Solution: 0 0 | 1 0 4 3 | 1 2 0 0 | 0 4 2 1 | 3 4 ----+---- ----+---- 0 2 | 4 0 1 2 | 4 3 0 4 | 2 1 3 4 | 2 1 ``` ### Rules - Fill empty cells (0s) with digits 1-4 - Each row must contain digits 1-4 exactly once - Each column must contain digits 1-4 exactly once - Each 2x2 box must contain digits 1-4 exactly once ## Typical Experimental Setup The standard setup used in d1/d2/SPG: - **Base model**: [LLaDA-8B-Instruct](https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct) - **Training**: RL fine-tuning (diffu-GRPO, d2-StepMerge, SPG, etc.) - **Prompt format**: System prompt with CoT reasoning (`<reasoning>...</reasoning><answer>...</answer>`) - **Evaluation metric**: Cell-level accuracy (fraction of correctly filled empty cells) ### Model Input Prompt Example The following is the actual prompt format used in d1/d2/SPG for training and evaluation: ``` [System] Please solve the following 4x4 Sudoku puzzle. The puzzle is provided as a 16-character string reading left-to-right, top-to-bottom, where '0' represents empty cells. Rules: - Fill empty cells with digits 1-4 - Each row must contain digits 1-4 exactly once - Each column must contain digits 1-4 exactly once - Each 2x2 box must contain digits 1-4 exactly once Important: Your solution must be a COMPLETE 16-character string with only the digits 1-4, representing your final solved grid. Respond in this exact format: <reasoning> Your step-by-step solving process </reasoning> <answer> [16-character solution string with no spaces or separators] </answer> [User] Solve the following Sudoku puzzle: 0321003004002100 ``` **Expected model output (example - actual CoT content varies by model):** ``` <reasoning> The puzzle grid is: 0 3 | 2 1 0 0 | 3 0 ----+---- 0 4 | 0 0 2 1 | 0 0 Row 1: has 3,2,1 → missing 4 → position 0 = 4 Row 2: has 3 → need 1,2,4 Col 0: has 2 → need 1 or 4; Box0: has 3 → R1C0=4, so need 1 → position 4 = 1 Col 1: has 3,4,1 → need 2 → position 5 = 2 Col 3: has 1 → need 2,3,4; Row2 needs 4 → position 7 = 4 Row 3: has 4 → need 1,2,3 Col 2: has 2,3 → need 1 or 4; Row3 needs 1,2,3 → position 10 = 1 Col 3: has 1,4 → need 2 or 3; Box3: has 1 → position 11 = 2 Row 4: has 2,1 → need 3,4 Col 2: has 2,3,1 → need 4 → position 14 = 4 Col 3: has 1,4,2 → need 3 → position 15 = 3 </reasoning> <answer> 4321123434122143 </answer> ``` ## Citation ```bibtex @article{zhao2025d1, title={d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning}, author={Zhao, Yanchen and Diao, Shitian and Bao, Hanze and Nie, Shuai and Wang, Juntao and Zhang, Min}, journal={arXiv preprint arXiv:2504.12216}, year={2025} } @article{wang2026d2, title={d2: Improved Techniques for Training Reasoning Diffusion Language Models}, author={Wang, Guanghan and Turok, Gilad and Schiff, Yair and Arriola, Marianne and Kuleshov, Volodymyr}, journal={arXiv preprint arXiv:2509.21474}, year={2026} } @article{spg2025, title={SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models}, author={Facebook Research}, journal={arXiv preprint arXiv:2510.09541}, year={2025} } ```

提供机构：

Mayfull

5,000+

优质数据集

54 个

任务类型

进入经典数据集