gmannem/RecurrReason
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/gmannem/RecurrReason
下载链接
链接失效反馈官方服务:
资源简介:
RecurrReason是一个用于评估语言模型多步推理能力的基准数据集,包含四个具有不同结构特性的逻辑谜题:积木世界、跳棋、汉诺塔和河流穿越。这些谜题旨在测试模型在中间步骤合法性、解决方案最优性和长度泛化方面的表现。数据集分为训练集(N=1-7)和测试集(N=8-10),用于评估模型在分布外数据上的性能。每个谜题都有详细的规则和约束条件,以及状态表示格式和示例轨迹。数据集还提供了加载和使用指南、引用信息和许可详情。
RecurrReason is a benchmark for evaluating multi-step reasoning in language models through four symbolic puzzles with different structural properties: Block World, Checkers Jumping, Tower of Hanoi, and River Crossing. These puzzles test models on move validity, optimality, and length generalization. The dataset is split into training (N=1-7) and test (N=8-10) sets to evaluate out-of-distribution performance. Each puzzle includes detailed rules and constraints, state representation formats, and example trajectories. The README also provides instructions on loading and using the dataset, citation details, and licensing information.
提供机构:
gmannem



