five

Kanzoet97/Sumo

收藏
Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Kanzoet97/Sumo
下载链接
链接失效反馈
官方服务:
资源简介:
Sudoku-CTC-Reasoning数据集包含了来自YouTube频道Cracking the Cryptic的1351个数独谜题的推理轨迹,为训练语言模型学习数独游戏中的推理或更广泛的推理密集型任务提供了丰富的学习信号。数据集包含两个子集:raw和processed。raw子集提供了从YouTube视频中提取的动作数据和音频转录数据,其中动作数据是通过视频到动作的管道提取的数独板上的动作序列,音频转录数据是通过Whisper ASR从视频音频中提取的。processed子集的详细信息可在Sudoku-Bench的data_processing README中找到。

The Sudoku-CTC-Reasoning dataset contains the reasoning traces of 1351 puzzles featured in the Cracking the Cryptic YouTube channel, and thus provides rich learning signals for training LMs to learn reasoning in a Sudoku game or for a broader range of reasoning-intensive tasks. The dataset has two subsets: raw and processed. The raw subset provides action_data and asr_data extracted from YouTube videos, where action_data is a sequence of actions taken on the SudokuPad app extracted via a video-to-actions pipeline, and asr_data is the audio transcript of the puzzle extracted using Whisper ASR. The processed subset is described in detail in the Sudoku-Benchs data_processing README.
提供机构:
Kanzoet97
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作