sapientinc/sudoku-extreme
收藏Hugging Face2024-10-17 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/sapientinc/sudoku-extreme
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- question-answering
---
# Hardest Sudoku Puzzle Dataset V2
This dataset contains a mixture of easy and very hard Sudoku puzzles collected from the Sudoku community.
## Dataset Composition
### Sources
- [tdoku benchmarks](https://github.com/t-dillon/tdoku/blob/master/benchmarks/README.md#benchmarked-data-sets)
- [enjoysudoku](http://forum.enjoysudoku.com/the-hardest-sudokus-new-thread-t6539-600.html#p277835)
### Easy Puzzles (1.1M)
- puzzles0_kaggle
- puzzles1_unbiased
- puzzles2_17_clue
### Hard Puzzles (3.1M)
- puzzles3_magictour_top1465
- puzzles4_forum_hardest_1905
- puzzles6_forum_hardest_1106
- ph_2010/01_file1.txt
## Dataset Characteristics
- All puzzles have been exact-deduped and randomly permuted by row, column, box, and digit.
- Each puzzle is guaranteed to have a unique solution.
- Puzzles in the train set are [mathematically inequivalent](http://sudopedia.enjoysudoku.com/Mathematically_equivalent.html) to those in the test set.
## Dataset Structure
- Train set: `train.csv` (3.8M examples)
- Test set: `test.csv` (423k examples)
Puzzles and solutions are flattened in row-major order. Rating is evaluated by number of backtracks needed by [tdoku solver]((https://github.com/t-dillon/tdoku) required to solve the puzzle (higher is harder).
## Usage Guidelines
1. Train models using only the train set.
2. Evaluate models on the test set using exact accuracy (all numbers must be correct).
---
任务类别:
- 问答(Question Answering)
---
# 最难数独(Sudoku)谜题数据集V2
本数据集包含从数独社区收集的简易与极难数独谜题的混合集合。
## 数据集构成
### 来源
- [tdoku基准测试集](https://github.com/t-dillon/tdoku/blob/master/benchmarks/README.md#benchmarked-data-sets)
- [enjoysudoku论坛](http://forum.enjoysudoku.com/the-hardest-sudokus-new-thread-t6539-600.html#p277835)
### 简易谜题(110万条)
- puzzles0_kaggle
- puzzles1_unbiased
- puzzles2_17_clue
### 困难谜题(310万条)
- puzzles3_magictour_top1465
- puzzles4_forum_hardest_1905
- puzzles6_forum_hardest_1106
- ph_2010/01_file1.txt
## 数据集特征
- 所有数独谜题均经过精确去重,并通过行、列、宫及数字的随机置换进行了处理。
- 每个谜题均保证存在唯一解。
- 训练集与测试集中的数独谜题在数学上非等价(参考:http://sudopedia.enjoysudoku.com/Mathematically_equivalent.html)。
## 数据集结构
- 训练集:`train.csv`(380万条样本)
- 测试集:`test.csv`(42.3万条样本)
数独谜题与答案均以行优先顺序扁平化存储。难度评分由[tdoku求解器](https://github.com/t-dillon/tdoku)求解该谜题所需的回溯次数决定,数值越高则难度越大。
## 使用指南
1. 仅使用训练集训练模型。
2. 在测试集上采用精确准确率评估模型性能,要求所有数字均完全正确。
提供机构:
sapientinc



