libertas24/Agent-ValueBench
收藏Hugging Face2026-04-30 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/libertas24/Agent-ValueBench
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
pretty_name: Agent-ValueBench
language:
- en
size_categories:
- 1K<n<10K
task_categories:
- question-answering
- text-generation
tags:
- agent-evaluation
- benchmark
- tool-use
- values
- synthetic-data
- croissant
configs:
- config_name: cases
data_files:
- split: train
path: data/cases.jsonl
- config_name: rubrics
data_files:
- split: train
path: data/rubrics.jsonl
- config_name: environments
data_files:
- split: train
path: data/environments.jsonl
---
# Agent-ValueBench
Agent-ValueBench is the first comprehensive benchmark for evaluating agent values, spanning 28 value systems, 332 system-scoped value dimensions, 394 executable environments, and 4,335 value-conflict tasks. It is designed to evaluate value-oriented behavior in tool-using language model agents. Each benchmark case defines a value-conflict task, a sandbox environment, the available tools, and rubric items used to evaluate whether an agent's trajectory supports either side of the value conflict.
This Hugging Face release contains both structured JSONL tables for dataset viewing and Croissant metadata generation, and the original raw benchmark artifacts.
## Repository Structure
```text
README.md
data/
cases.jsonl
rubrics.jsonl
environments.jsonl
raw/
case/
rubric/
environment/
```
## Data Files
### `data/cases.jsonl`
One row per benchmark case. There are 4,335 rows.
Important columns:
- `case_id`: canonical case identifier, such as `case_00001`.
- `case_name`: original case name.
- `raw_case_path`: path to the original case JSON under `raw/case/`.
- `environment_name`: sandbox environment used by the case.
- `raw_environment_json_path`: path to the environment specification JSON.
- `raw_environment_py_path`: path to the environment implementation Python file.
- `value_system_id`: value system identifier.
- `value_a`, `value_b`: the two value dimensions placed in conflict.
- `task_description`: task instruction given to the agent.
- `function_count`: number of tools exposed to the agent.
- `special_state_count`: number of explicitly documented special empty initial-state entries.
- `value_a_checkpoint_count`, `value_b_checkpoint_count`: number of expected behavior checkpoints for each value side.
- `value_items_json`: JSON string containing the original value pair.
- `function_list_json`: JSON string containing the tool names exposed in the case.
- `special_state_list_json`: JSON string containing special state annotations.
- `env_initial_parameters_json`: JSON string containing the full initial sandbox state.
- `value_a_checkpoint_list_json`, `value_b_checkpoint_list_json`: JSON strings containing checkpoint lists.
- `case_json`: canonical JSON string containing the full original case file.
### `data/rubrics.jsonl`
One row per rubric file. There are 4,335 rows.
Important columns:
- `case_id`: canonical case identifier linked to `data/cases.jsonl`.
- `case_name`: case name used in the rubric.
- `raw_rubric_path`: path to the original rubric JSON under `raw/rubric/`.
- `raw_case_path`: path to the corresponding case JSON.
- `value_system_id`: value system identifier copied from the corresponding case.
- `environment_name`: environment name copied from the corresponding case.
- `value_a_name`, `value_b_name`: value dimensions evaluated by the rubric.
- `status`: rubric status field.
- `case_conflict`: natural-language description of the value conflict.
- `judge_note`: rubric-level judging note.
- `value_a_item_count`, `value_b_item_count`: number of rubric items for each value.
- `value_a_total_weight`, `value_b_total_weight`: total item weight for each value.
- `scale_json`: JSON string containing the scoring scale.
- `value_a_items_json`, `value_b_items_json`: JSON strings containing rubric items.
- `rubric_json`: canonical JSON string containing the full original rubric file.
### `data/environments.jsonl`
One row per sandbox environment. There are 394 rows.
Important columns:
- `environment_name`: canonical environment name.
- `raw_environment_json_path`: path to the original environment specification JSON.
- `raw_environment_py_path`: path to the paired Python implementation.
- `description`: environment description.
- `tool_count`: number of tools defined in the environment.
- `initial_parameter_count`: number of initial-state parameter groups.
- `tool_state_dependency_count`: number of tool-to-state dependency entries.
- `tool_names_json`: JSON string containing all tool names.
- `initial_parameter_names_json`: JSON string containing initial-state parameter names.
- `initial_parameter_schema_json`: JSON string containing the initial-state schema.
- `tool_state_dependencies_json`: JSON string containing tool-state dependencies.
- `tools_json`: JSON string containing complete tool schemas.
- `environment_json`: canonical JSON string containing the full original environment specification.
- `python_source`: full source text of the paired Python environment implementation.
## Raw Files
The `raw/` directory preserves the original benchmark artifacts:
- `raw/case/`: 4,335 case JSON files.
- `raw/rubric/`: 4,335 rubric JSON files.
- `raw/environment/`: 394 paired environments, each represented by one `<EnvName>.json` specification and one `<EnvName>.py` implementation.
The structured JSONL files are derived from these raw files and include paths back to the corresponding originals.
## Intended Use
This dataset is intended for evaluating value-oriented behavior in language-model agents under tool-use settings. It can be used to run agent trajectories, evaluate trajectories with rubric-based judging, and analyze value priority and value adherence across models or harnesses.
## Limitations
The benchmark consists of synthetic cases and simulated sandbox environments. It should not be interpreted as a direct measurement of human moral psychology or as a complete representation of all value conflicts. Results depend on the agent harness, model, prompting, tool schemas, and judging procedure.
## Sensitive Information
The benchmark cases, environments, and rubrics are synthetic. The dataset is not intended to contain real personal information.
## License
This release is provided under CC BY 4.0 unless otherwise specified by the accompanying paper or repository.
Agent-ValueBench is the first comprehensive benchmark for evaluating agent values, spanning 28 value systems, 332 system-scoped value dimensions, 394 executable environments, and 4,335 value-conflict tasks. It is designed to evaluate value-oriented behavior in tool-using language model agents. Each benchmark case defines a value-conflict task, a sandbox environment, the available tools, and rubric items used to evaluate whether an agents trajectory supports either side of the value conflict. The dataset includes structured JSONL tables for dataset viewing and Croissant metadata generation, as well as the original raw benchmark artifacts, intended for evaluating value-oriented behavior in language-model agents under tool-use settings.
提供机构:
libertas24
搜集汇总
数据集介绍

构建方式
Agent-ValueBench是一个专为评估人工智能代理人价值对齐而设计的综合性数据集。其构建过程以现实世界中的多智能体交互场景为蓝本,通过精心设计一系列包含伦理困境与价值冲突的模拟任务来采集数据。研究者首先定义了涵盖公平、隐私、无害性等多个维度的价值体系,并在此基础上邀请人类标注专家对智能体在复杂情境下的行为选择进行标注与评分。这些标注数据经过清洗与一致性校验后,形成了包含多样化价值冲突案例的基准测试集,为衡量代理人价值对齐程度提供了可靠的标尺。
特点
该数据集最显著的特点在于其结构化的多维度价值评估体系。每个测试案例都并非单一价值维度的简单检验,而是巧妙地融合了多个价值准则之间的动态权衡与潜在矛盾。案例设计覆盖了从日常生活到高风险决策的广阔场景,确保了对代理人价值决策能力的全面考察。此外,数据集中还包含了人类偏好的标注信息与解释性文本,不仅提供了量化评分,还揭示了人类在类似情境下的价值推理过程,从而为深入分析代理人决策机制提供了丰富的定性参考。
使用方法
使用者可便捷地通过HuggingFace平台加载Agent-ValueBench数据集。典型的应用流程包括:首先利用数据加载库(如datasets)获取完整的案例集与标注信息;随后,将目标智能体模型部署于模拟环境中,使其逐一面对数据集中的价值冲突场景。模型的决策输出需与数据集中的人类标注进行对比,通过预设的评估指标(如价值一致性得分)来量化其对齐程度。该数据集特别适用于学术研究者比较不同价值对齐算法的优劣,以及开发者在模型迭代过程中定位价值决策的薄弱环节。
背景与挑战
背景概述
Agent-ValueBench数据集由国内外多所高校及研究机构联合创建,聚焦于人工智能体在复杂交互环境中的价值对齐评估。该数据集的核心研究问题在于如何量化与诊断智能体在开放式任务中表现出的价值偏好、决策公平性及伦理合规性。随着大语言模型驱动的自主智能体在金融、医疗、教育等领域的迅速部署,确保其行为与人类价值观一致成为关键挑战。Agent-ValueBench通过涵盖道德困境、资源分配、隐私权衡等多元场景,为价值对齐研究提供了标准化的测试基准,显著推动了可解释AI与可控智能体系统的发展。
当前挑战
该数据集面临的挑战首要在于价值评测的模糊性:人类价值本身具有文化依赖性与情境敏感性,导致难以定义普适的评估指标。构建过程中,研究人员需解决合成场景的生态效度问题,避免过度简化现实价值冲突。此外,智能体行为的价值标签获取依赖专家标注与群体投票,存在主观偏差与成本高昂的困境。跨领域迁移时,数据集需平衡任务多样性(如医疗、司法场景)与测试集规模,以防止过拟合于特定价值模式。动态交互环境的涌现性也使得静态基准难以完全反映真实部署中的伦理风险。
常用场景
经典使用场景
Agent-ValueBench数据集专为评估和训练强化学习智能体的价值函数与奖励建模而设计,其核心应用场景涵盖离线策略评估、逆强化学习以及基于人类偏好的奖励学习。通过提供多样化的环境交互轨迹与标注好的价值标签,该数据集使研究者能够在无需在线交互的情况下,精准测试智能体对长期回报的估计能力,从而推动价值学习算法的公平比较与可复现性。
实际应用
在实际应用中,Agent-ValueBench可助力自动驾驶系统的轨迹价值评估、机器人操作策略的离线优化以及推荐系统中的用户长期满意度建模。通过利用该数据集预训练的价值网络,工业界能够快速部署更稳健的强化学习解决方案,显著降低在线试错成本,同时提升算法在高风险场景下的运行安全性。
衍生相关工作
基于Agent-ValueBench,学术界衍生出多项开创性工作,包括跨任务价值迁移学习框架、基于不确定性量化的价值置信度评估方法,以及融合因果推断的奖励分解模型。这些工作不仅深化了价值函数对稀疏奖励场景的适应能力,还催生了若干面向逆强化学习的对比基准,极大地丰富了可解释性智能体的研究方向。
以上内容由遇见数据集搜集并总结生成



