five

DNRBench

收藏
魔搭社区2025-12-05 更新2025-05-03 收录
下载链接:
https://modelscope.cn/datasets/ServiceNow-AI/DNRBench
下载链接
链接失效反馈
官方服务:
资源简介:
# DNR Bench Don’t Reason Bench (DNR Bench), a novel benchmark designed to expose a vulnerability in current RLMs: their tendency to over-reason by attempting to solve unsolvable problems, leading to excessively long responses. # Data Summary The DNR Bench dataset contains 150 adversarially crafted prompts divided into five distinct categories: - Imaginary Reference - Indifferent - Math, - Redundant, - Unanswerable. Each category targets a specific failure mode observed in reasoning-optimized LLMs, such as hallucinating nonexistent references, failing to remain neutral in ambiguous contexts, incorrectly solving flawed math problems, overanalyzing redundant information, or answering questions that lack sufficient data. # Leaderboard This dataset is used to test reasoning LLMs in [DNR Leaderboard on Huggingface](https://huggingface.co/spaces/ServiceNow-AI/Do-not-reason-bench) # Citation ```bibtex @misc{hashemi2025dnrbenchbenchmarkingoverreasoning, title={DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs}, author={Masoud Hashemi and Oluwanifemi Bamgbose and Sathwik Tejaswi Madhusudhan and Jishnu Sethumadhavan Nair and Aman Tiwari and Vikas Yadav}, year={2025}, eprint={2503.15793}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2503.15793}, } ```

# DNR基准测试(DNR Bench) 不推理基准测试(Don’t Reason Bench,简称DNR Bench)是一款全新的基准测试集,旨在揭示当前推理语言模型(Reasoning Language Models,RLMs)存在的一项漏洞:这类模型倾向于过度推理——尝试解决无解问题,进而生成过长的回复。 # 数据集概述 DNR Bench 数据集共包含150条经对抗式构造的提示词,分为五大独立类别: - 虚构引用(Imaginary Reference) - 中立失效(Indifferent) - 数学(Math) - 冗余信息(Redundant) - 无解问题(Unanswerable) 每个类别均针对推理优化型大语言模型(Large Language Models,LLMs)观测到的一类特定失效模式,具体包括虚构不存在的引用、在模糊语境中无法保持中立、错误求解存在缺陷的数学问题、对冗余信息进行过度分析,以及在数据不足的情况下强行回答问题。 # 排行榜 该数据集可用于在[Huggingface平台上的DNR基准测试排行榜](https://huggingface.co/spaces/ServiceNow-AI/Do-not-reason-bench)中测试推理型大语言模型。 # 引用信息 bibtex @misc{hashemi2025dnrbenchbenchmarkingoverreasoning, title={DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs}, author={Masoud Hashemi and Oluwanifemi Bamgbose and Sathwik Tejaswi Madhusudhan and Jishnu Sethumadhavan Nair and Aman Tiwari and Vikas Yadav}, year={2025}, eprint={2503.15793}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2503.15793}, }
提供机构:
maas
创建时间:
2025-04-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作