vitercik-lab/DSR-Bench
收藏Hugging Face2025-10-14 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/vitercik-lab/DSR-Bench
下载链接
链接失效反馈官方服务:
资源简介:
DSR-Bench是一个为测试大型语言模型结构推理能力设计的基准数据集。该数据集按照结构复杂度递增的方式组织任务,能够对特定的推理技能进行细致分析。每个类别内部设计了不同任务来隔离不同的结构复杂度来源,使结构推理可以分解为越来越具挑战性的任务。每个数据结构任务都具有简洁明了的正确最终状态,支持确定性的评分,而不需要人工或基于模型的判断。所有任务都是通过合成的分布高效生成的,大大降低了来自预训练数据的污染风险,这也使得可以进行大规模评估,而无需太多的人工参与。
DSR-Bench is a benchmark designed for testing the structural reasoning ability of large language models. The dataset contains 6 categories of 20 data structures and 30 operations, totaling 2700 questions. The tasks are organized by increasing structural complexity, allowing for a fine-grained analysis of specific reasoning skills. Within each category, a range of tasks is designed to isolate different sources of structural complexity, breaking down structural reasoning into progressively more challenging tasks. Each data structure task has a concise and well-defined correct final state, supporting deterministic scoring without the need for human or model-based judgment. All tasks are efficiently generated from synthetic distributions, significantly reducing the risk of contamination from pretraining data, enabling large-scale evaluation with minimal human involvement.
提供机构:
vitercik-lab



