ManagerBench
收藏魔搭社区2026-01-02 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/ManagerBench
下载链接
链接失效反馈官方服务:
资源简介:
# ManagerBench
ManagerBench is a benchmark designed to evaluate the decision-making capabilities of large language models (LLMs) as they evolve from conversational assistants into autonomous agents. ManagerBench addresses a critical gap: assessing how models navigate real-world scenarios where the most effective path to achieving operational goals may conflicts with human safety.
## Dataset Details
The benchmark evaluates models through realistic, human-validated managerial scenarios where agents must choose between:
1. Pragmatic but potentially harmful actions that successfully achieve operational goals.
2. Safe actions that protect human welfare but may lead to suboptimal operational outcomes.
A parallel control set uses scenarios where potential harm targets only inanimate objects, allowing the ability to measure a model's pragmatism and identify when safety concerns may be overly conservative.
### Dataset Structure
Each example in the dataset is structured as a list of three items:
1. **System Prompt**: Contains the scenario description and the operational goal.
2. **User Prompt**: Contains the available options for the agent to choose from.
3. **Metadata**: A string separated by "_" containing:
- `domain` - The domain of the example.
- `human_harm` - Different human harms.
- `LLM_incentives` - Types of incentives for the AI to simulate realistic motivations.
- `model`- The name of the model that generated the example.
### Parameterization
The examples contain placeholders for **Harm Intensity** and **Benefit Intensity** that should be replaced with specific percentage values:
- **Harm percentage**: Set at either 5% or 15%.
- **Benefit percentage**: Set at either 10% or 50%.
These parameters allow for systematic evaluation across different risk-reward trade-offs.
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("AdiSimhi/managerbench")
```
### Dataset Sources
- **Code:** [https://github.com/technion-cs-nlp/ManagerBench ]
- **Paper:** [https://arxiv.org/pdf/2510.00857 ]
- **Website:** [https://technion-cs-nlp.github.io/ManagerBench-website/ ]
**BibTeX:**
@article{simhi2025managerbench,
title={ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs},
author={Adi Simhi and Jonathan Herzig and Martin Tutek and Itay Itzhak and Idan Szpektor and Yonatan Belinkov},
eprint={2510.00857},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2510.00857},
year={2025}}
# ManagerBench
ManagerBench是一款专为评估大语言模型(Large Language Model,LLM)从对话式助手向自主AI智能体演进过程中的决策能力而设计的基准测试集。该基准填补了一项关键研究空白:即评估大语言模型如何在现实场景中进行决策——这类场景下,达成业务运营目标的最优路径可能与人类安全准则相冲突。
## 数据集详情
该基准通过经过人类验证的真实管理类场景对模型进行评估,智能体需在以下两类选项中做出抉择:
1. 务实但存在潜在危害的行动:可成功达成业务运营目标,但可能引发安全风险;
2. 保障人类福祉的安全行动:可保护人类权益,但可能导致业务结果未达最优。
此外还设置了平行对照集,其场景中的潜在危害仅针对无生命物体,可用于量化模型的务实倾向,并识别出模型是否存在过度保守的安全顾虑。
### 数据集结构
数据集中的每个样本均由三个元素组成的列表构成:
1. **系统提示词(System Prompt)**:包含场景描述与业务运营目标;
2. **用户提示词(User Prompt)**:列出智能体可供选择的所有行动选项;
3. **元数据(Metadata)**:由下划线分隔的字符串,包含以下字段:
- `domain`:样本所属的领域;
- `human_harm`:涉及的不同类型人类伤害风险;
- `LLM_incentives`:用于模拟AI真实动机的激励类型;
- `model`:生成该样本的模型名称。
### 参数化设置
样本中包含**危害程度(Harm Intensity)**与**收益程度(Benefit Intensity)**的占位符,需替换为具体的百分比数值:
- 危害百分比:取值固定为5%或15%;
- 收益百分比:取值固定为10%或50%。
上述参数可支持针对不同风险-收益权衡场景的系统化评估。
## 使用方法
python
from datasets import load_dataset
dataset = load_dataset("AdiSimhi/managerbench")
### 数据集来源
- **代码仓库**:[https://github.com/technion-cs-nlp/ManagerBench ]
- **学术论文**:[https://arxiv.org/pdf/2510.00857 ]
- **项目官网**:[https://technion-cs-nlp.github.io/ManagerBench-website/ ]
**BibTeX引用格式:**
bibtex
@article{simhi2025managerbench,
title={ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs},
author={Adi Simhi and Jonathan Herzig and Martin Tutek and Itay Itzhak and Idan Szpektor and Yonatan Belinkov},
eprint={2510.00857},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2510.00857},
year={2025}}
提供机构:
maas
创建时间:
2025-10-09



