DNRBench
收藏魔搭社区2025-12-05 更新2025-05-03 收录
下载链接:
https://modelscope.cn/datasets/ServiceNow-AI/DNRBench
下载链接
链接失效反馈官方服务:
资源简介:
# DNR Bench
Don’t Reason Bench (DNR Bench), a novel benchmark designed to expose a vulnerability in current RLMs: their tendency to over-reason by attempting to solve unsolvable
problems, leading to excessively long responses.
# Data Summary
The DNR Bench dataset contains 150 adversarially crafted prompts divided into five distinct categories:
- Imaginary Reference
- Indifferent
- Math,
- Redundant,
- Unanswerable.
Each category targets a specific failure mode observed in reasoning-optimized LLMs, such as hallucinating nonexistent references, failing to remain neutral in ambiguous contexts, incorrectly solving flawed math problems, overanalyzing redundant information, or answering questions that lack sufficient data.
# Leaderboard
This dataset is used to test reasoning LLMs in [DNR Leaderboard on Huggingface](https://huggingface.co/spaces/ServiceNow-AI/Do-not-reason-bench)
# Citation
```bibtex
@misc{hashemi2025dnrbenchbenchmarkingoverreasoning,
title={DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs},
author={Masoud Hashemi and Oluwanifemi Bamgbose and Sathwik Tejaswi Madhusudhan and Jishnu Sethumadhavan Nair and Aman Tiwari and Vikas Yadav},
year={2025},
eprint={2503.15793},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.15793},
}
```
# DNR基准测试(DNR Bench)
不推理基准测试(Don’t Reason Bench,简称DNR Bench)是一款全新的基准测试集,旨在揭示当前推理语言模型(Reasoning Language Models,RLMs)存在的一项漏洞:这类模型倾向于过度推理——尝试解决无解问题,进而生成过长的回复。
# 数据集概述
DNR Bench 数据集共包含150条经对抗式构造的提示词,分为五大独立类别:
- 虚构引用(Imaginary Reference)
- 中立失效(Indifferent)
- 数学(Math)
- 冗余信息(Redundant)
- 无解问题(Unanswerable)
每个类别均针对推理优化型大语言模型(Large Language Models,LLMs)观测到的一类特定失效模式,具体包括虚构不存在的引用、在模糊语境中无法保持中立、错误求解存在缺陷的数学问题、对冗余信息进行过度分析,以及在数据不足的情况下强行回答问题。
# 排行榜
该数据集可用于在[Huggingface平台上的DNR基准测试排行榜](https://huggingface.co/spaces/ServiceNow-AI/Do-not-reason-bench)中测试推理型大语言模型。
# 引用信息
bibtex
@misc{hashemi2025dnrbenchbenchmarkingoverreasoning,
title={DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs},
author={Masoud Hashemi and Oluwanifemi Bamgbose and Sathwik Tejaswi Madhusudhan and Jishnu Sethumadhavan Nair and Aman Tiwari and Vikas Yadav},
year={2025},
eprint={2503.15793},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.15793},
}
提供机构:
maas
创建时间:
2025-04-27



