five

sapientinc/sudoku-extreme

收藏
Hugging Face2024-10-17 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/sapientinc/sudoku-extreme
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - question-answering --- # Hardest Sudoku Puzzle Dataset V2 This dataset contains a mixture of easy and very hard Sudoku puzzles collected from the Sudoku community. ## Dataset Composition ### Sources - [tdoku benchmarks](https://github.com/t-dillon/tdoku/blob/master/benchmarks/README.md#benchmarked-data-sets) - [enjoysudoku](http://forum.enjoysudoku.com/the-hardest-sudokus-new-thread-t6539-600.html#p277835) ### Easy Puzzles (1.1M) - puzzles0_kaggle - puzzles1_unbiased - puzzles2_17_clue ### Hard Puzzles (3.1M) - puzzles3_magictour_top1465 - puzzles4_forum_hardest_1905 - puzzles6_forum_hardest_1106 - ph_2010/01_file1.txt ## Dataset Characteristics - All puzzles have been exact-deduped and randomly permuted by row, column, box, and digit. - Each puzzle is guaranteed to have a unique solution. - Puzzles in the train set are [mathematically inequivalent](http://sudopedia.enjoysudoku.com/Mathematically_equivalent.html) to those in the test set. ## Dataset Structure - Train set: `train.csv` (3.8M examples) - Test set: `test.csv` (423k examples) Puzzles and solutions are flattened in row-major order. Rating is evaluated by number of backtracks needed by [tdoku solver]((https://github.com/t-dillon/tdoku) required to solve the puzzle (higher is harder). ## Usage Guidelines 1. Train models using only the train set. 2. Evaluate models on the test set using exact accuracy (all numbers must be correct).

--- 任务类别: - 问答(Question Answering) --- # 最难数独(Sudoku)谜题数据集V2 本数据集包含从数独社区收集的简易与极难数独谜题的混合集合。 ## 数据集构成 ### 来源 - [tdoku基准测试集](https://github.com/t-dillon/tdoku/blob/master/benchmarks/README.md#benchmarked-data-sets) - [enjoysudoku论坛](http://forum.enjoysudoku.com/the-hardest-sudokus-new-thread-t6539-600.html#p277835) ### 简易谜题(110万条) - puzzles0_kaggle - puzzles1_unbiased - puzzles2_17_clue ### 困难谜题(310万条) - puzzles3_magictour_top1465 - puzzles4_forum_hardest_1905 - puzzles6_forum_hardest_1106 - ph_2010/01_file1.txt ## 数据集特征 - 所有数独谜题均经过精确去重,并通过行、列、宫及数字的随机置换进行了处理。 - 每个谜题均保证存在唯一解。 - 训练集与测试集中的数独谜题在数学上非等价(参考:http://sudopedia.enjoysudoku.com/Mathematically_equivalent.html)。 ## 数据集结构 - 训练集:`train.csv`(380万条样本) - 测试集:`test.csv`(42.3万条样本) 数独谜题与答案均以行优先顺序扁平化存储。难度评分由[tdoku求解器](https://github.com/t-dillon/tdoku)求解该谜题所需的回溯次数决定,数值越高则难度越大。 ## 使用指南 1. 仅使用训练集训练模型。 2. 在测试集上采用精确准确率评估模型性能,要求所有数字均完全正确。
提供机构:
sapientinc
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作