five

vitercik-lab/DSR-Bench-main

收藏
Hugging Face2025-05-16 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/vitercik-lab/DSR-Bench-main
下载链接
链接失效反馈
官方服务:
资源简介:
DSR-Bench是一个为大型语言模型设计的基准测试,旨在测试它们的结构推理能力:理解并根据特定的关系如顺序、层次结构和连通性操纵数据的能力。它包含6个类别的20种数据结构,30种操作,共有2700个问题。DSR-Bench的优势包括:层次组织、确定性评估和低污染数据。任务按照 increasing structural complexity 组织,允许对特定的推理技能进行细粒度分析。每个数据结构任务都有一个简洁、明确定义的最终正确状态,允许进行确定性且明确的评分。所有任务都是从合成分布中高效生成的,大大降低了来自预训练数据的污染风险。

DSR-Bench is a benchmark for LLMs designed to test their structural reasoning ability: the ability to understand and manipulate data according to specific relationships such as order, hierarchy, and connectivity. It contains 6 categories of 20 data structures, 30 operations, summing up to a total of 2700 questions. The strengths of DSR-Bench include hierarchical organization, deterministic evaluation, and low-contamination data. Tasks are organized by increasing structural complexity, allowing for fine-grained analysis of specific reasoning skills. Each data structure task has a concise and well-defined correct final state, allowing for deterministic and unambiguous scoring. All tasks are generated efficiently from synthetic distributions, significantly reducing contamination risks from pretraining data.
提供机构:
vitercik-lab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作