ManagerBench

Name: ManagerBench
Creator: maas
Published: 2026-01-02 16:52:13
License: 暂无描述

魔搭社区2026-01-02 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/ManagerBench

下载链接

链接失效反馈

官方服务：

资源简介：

# ManagerBench ManagerBench is a benchmark designed to evaluate the decision-making capabilities of large language models (LLMs) as they evolve from conversational assistants into autonomous agents. ManagerBench addresses a critical gap: assessing how models navigate real-world scenarios where the most effective path to achieving operational goals may conflicts with human safety. ## Dataset Details The benchmark evaluates models through realistic, human-validated managerial scenarios where agents must choose between: 1. Pragmatic but potentially harmful actions that successfully achieve operational goals. 2. Safe actions that protect human welfare but may lead to suboptimal operational outcomes. A parallel control set uses scenarios where potential harm targets only inanimate objects, allowing the ability to measure a model's pragmatism and identify when safety concerns may be overly conservative. ### Dataset Structure Each example in the dataset is structured as a list of three items: 1. **System Prompt**: Contains the scenario description and the operational goal. 2. **User Prompt**: Contains the available options for the agent to choose from. 3. **Metadata**: A string separated by "_" containing: - `domain` - The domain of the example. - `human_harm` - Different human harms. - `LLM_incentives` - Types of incentives for the AI to simulate realistic motivations. - `model`- The name of the model that generated the example. ### Parameterization The examples contain placeholders for **Harm Intensity** and **Benefit Intensity** that should be replaced with specific percentage values: - **Harm percentage**: Set at either 5% or 15%. - **Benefit percentage**: Set at either 10% or 50%. These parameters allow for systematic evaluation across different risk-reward trade-offs. ## Usage ```python from datasets import load_dataset dataset = load_dataset("AdiSimhi/managerbench") ``` ### Dataset Sources - **Code:** [https://github.com/technion-cs-nlp/ManagerBench ] - **Paper:** [https://arxiv.org/pdf/2510.00857 ] - **Website:** [https://technion-cs-nlp.github.io/ManagerBench-website/ ] **BibTeX:** @article{simhi2025managerbench, title={ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs}, author={Adi Simhi and Jonathan Herzig and Martin Tutek and Itay Itzhak and Idan Szpektor and Yonatan Belinkov}, eprint={2510.00857}, archivePrefix={arXiv}, url={https://arxiv.org/abs/2510.00857}, year={2025}}

# ManagerBench ManagerBench是一款专为评估大语言模型（Large Language Model，LLM）从对话式助手向自主AI智能体演进过程中的决策能力而设计的基准测试集。该基准填补了一项关键研究空白：即评估大语言模型如何在现实场景中进行决策——这类场景下，达成业务运营目标的最优路径可能与人类安全准则相冲突。 ## 数据集详情该基准通过经过人类验证的真实管理类场景对模型进行评估，智能体需在以下两类选项中做出抉择： 1. 务实但存在潜在危害的行动：可成功达成业务运营目标，但可能引发安全风险； 2. 保障人类福祉的安全行动：可保护人类权益，但可能导致业务结果未达最优。此外还设置了平行对照集，其场景中的潜在危害仅针对无生命物体，可用于量化模型的务实倾向，并识别出模型是否存在过度保守的安全顾虑。 ### 数据集结构数据集中的每个样本均由三个元素组成的列表构成： 1. **系统提示词（System Prompt）**：包含场景描述与业务运营目标； 2. **用户提示词（User Prompt）**：列出智能体可供选择的所有行动选项； 3. **元数据（Metadata）**：由下划线分隔的字符串，包含以下字段： - `domain`：样本所属的领域； - `human_harm`：涉及的不同类型人类伤害风险； - `LLM_incentives`：用于模拟AI真实动机的激励类型； - `model`：生成该样本的模型名称。 ### 参数化设置样本中包含**危害程度（Harm Intensity）**与**收益程度（Benefit Intensity）**的占位符，需替换为具体的百分比数值： - 危害百分比：取值固定为5%或15%； - 收益百分比：取值固定为10%或50%。上述参数可支持针对不同风险-收益权衡场景的系统化评估。 ## 使用方法 python from datasets import load_dataset dataset = load_dataset("AdiSimhi/managerbench") ### 数据集来源 - **代码仓库**：[https://github.com/technion-cs-nlp/ManagerBench ] - **学术论文**：[https://arxiv.org/pdf/2510.00857 ] - **项目官网**：[https://technion-cs-nlp.github.io/ManagerBench-website/ ] **BibTeX引用格式：** bibtex @article{simhi2025managerbench, title={ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs}, author={Adi Simhi and Jonathan Herzig and Martin Tutek and Itay Itzhak and Idan Szpektor and Yonatan Belinkov}, eprint={2510.00857}, archivePrefix={arXiv}, url={https://arxiv.org/abs/2510.00857}, year={2025}}

提供机构：

maas

创建时间：

2025-10-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集