DNRBench

Name: DNRBench
Creator: maas
Published: 2025-12-05 16:32:32
License: 暂无描述

魔搭社区2025-12-05 更新2025-05-03 收录

下载链接：

https://modelscope.cn/datasets/ServiceNow-AI/DNRBench

下载链接

链接失效反馈

官方服务：

资源简介：

# DNR Bench Don’t Reason Bench (DNR Bench), a novel benchmark designed to expose a vulnerability in current RLMs: their tendency to over-reason by attempting to solve unsolvable problems, leading to excessively long responses. # Data Summary The DNR Bench dataset contains 150 adversarially crafted prompts divided into five distinct categories: - Imaginary Reference - Indifferent - Math, - Redundant, - Unanswerable. Each category targets a specific failure mode observed in reasoning-optimized LLMs, such as hallucinating nonexistent references, failing to remain neutral in ambiguous contexts, incorrectly solving flawed math problems, overanalyzing redundant information, or answering questions that lack sufficient data. # Leaderboard This dataset is used to test reasoning LLMs in [DNR Leaderboard on Huggingface](https://huggingface.co/spaces/ServiceNow-AI/Do-not-reason-bench) # Citation ```bibtex @misc{hashemi2025dnrbenchbenchmarkingoverreasoning, title={DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs}, author={Masoud Hashemi and Oluwanifemi Bamgbose and Sathwik Tejaswi Madhusudhan and Jishnu Sethumadhavan Nair and Aman Tiwari and Vikas Yadav}, year={2025}, eprint={2503.15793}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2503.15793}, } ```

# DNR基准测试（DNR Bench）不推理基准测试（Don’t Reason Bench，简称DNR Bench）是一款全新的基准测试集，旨在揭示当前推理语言模型（Reasoning Language Models，RLMs）存在的一项漏洞：这类模型倾向于过度推理——尝试解决无解问题，进而生成过长的回复。 # 数据集概述 DNR Bench 数据集共包含150条经对抗式构造的提示词，分为五大独立类别： - 虚构引用（Imaginary Reference） - 中立失效（Indifferent） - 数学（Math） - 冗余信息（Redundant） - 无解问题（Unanswerable）每个类别均针对推理优化型大语言模型（Large Language Models，LLMs）观测到的一类特定失效模式，具体包括虚构不存在的引用、在模糊语境中无法保持中立、错误求解存在缺陷的数学问题、对冗余信息进行过度分析，以及在数据不足的情况下强行回答问题。 # 排行榜该数据集可用于在[Huggingface平台上的DNR基准测试排行榜](https://huggingface.co/spaces/ServiceNow-AI/Do-not-reason-bench)中测试推理型大语言模型。 # 引用信息 bibtex @misc{hashemi2025dnrbenchbenchmarkingoverreasoning, title={DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs}, author={Masoud Hashemi and Oluwanifemi Bamgbose and Sathwik Tejaswi Madhusudhan and Jishnu Sethumadhavan Nair and Aman Tiwari and Vikas Yadav}, year={2025}, eprint={2503.15793}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2503.15793}, }

提供机构：

maas

创建时间：

2025-04-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集