GridPuzzle
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/Mihir3009/GridPuzzle
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个评估集,包含274个基于网格的谜题,这些谜题具有不同的复杂度,旨在评估大型语言模型(LLMs)的推理能力。该数据集涵盖了多种网格大小(3x4、3x5、4x4、4x5和4x6),并设置了不同的难度级别,旨在深入了解大型语言模型在解决网格谜题时可能出现的推理错误。规模上,该数据集共有274个基于网格的谜题,其任务是对解决网格谜题时的推理链进行评估。
This dataset is an evaluation set containing 274 grid-based puzzles with varying complexity levels, which is specifically designed to evaluate the reasoning capabilities of Large Language Models (LLMs). It encompasses multiple grid size configurations including 3x4, 3x5, 4x4, 4x5 and 4x6, and features distinct difficulty tiers, aiming to provide in-depth insights into the reasoning errors that LLMs may encounter when solving grid-based puzzles. With a total of 274 grid-based puzzles, the core task of this evaluation set is to assess the reasoning chains generated during the process of solving such grid puzzles.
提供机构:
Mihir3009



