hkust-nlp/agentboard

Name: hkust-nlp/agentboard
Creator: hkust-nlp
Published: 2024-06-24 08:04:01
License: 暂无描述

Hugging Face2024-06-24 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/hkust-nlp/agentboard

下载链接

链接失效反馈

官方服务：

资源简介：

AgentBoard是一个多轮LLM代理的分析评估板，包含9个不同的任务，分为4种类型：Embodied AI、Game、Web和Tool。每个任务都有详细的环境数量、回合数、动作空间、上下文长度、进度率、平均子目标数以及难易度划分。数据集提供了详细的数据下载链接和文件结构，以及一个示例数据字段的描述。

提供机构：

hkust-nlp

原始信息汇总

数据集概述

数据集名称

AgentBoard

数据集描述

AgentBoard 包含9个多样化的任务，可分为4种类型：Embodied AI、Game、Web 和 Tool。

数据集配置

Embodied AI
- AlfWorld
- ScienceWorld
- BabyAI
Game
- Jericho
- PDDL
Web
- WebShop
- WebArena
Tool
- Tool-Query
- Tool-Operation

数据文件

每个配置包含一个测试集文件：

alfworld: data/alfworld/test.jsonl
scienceworld: data/scienceworld/test.jsonl
babyai: data/babyai/test.jsonl
jericho: data/jericho/test.jsonl
pddl: data/pddl/test.jsonl
webarena: data/webarena/test.jsonl
webshop: data/webshop/test.jsonl
tool-query: data/tool-query/test.jsonl
tool-operation: data/tool-operation/test.jsonl

数据统计

以下是9个环境的评估数据统计：

	AlfWorld	ScienceWorld	BabyAI	Jericho	PDDL	WebShop	WebArena	Tool-Query	Tool-Operation
#Environment	134	90	112	20	60	251	245	60	40
#Turn	6	15	10	20	20	3	25	5	6
#Action Space	13	21	8	150	8	2	12	15	16
#Context Length	900	2800	1800	1500	2700	1200	15000	2100	4300
Progress Rate	subgoal	subgoal	subgoal	subgoal	match	match	match	subgoal	subgoal/match
#Avg. Subgoals	3	5	4	6	6	4	6	5	5
Hard/Easy Cutoff	3	3	3	4	6	1	4	4	4

数据字段

以下是 ScienceWorld 任务的一个实例的数据字段示例： json { "task": "scienceworld", "id": 0, "goal": "Your task is to find the animal with the longest life span. The animals are in the outside location. Focus on the animal with the longest life span.", "subgoals": ["You move to the outside.", "You focus on the crocodile egg."], "difficulty": "easy", "additional_info": {"var": 5, "env_name": "lifespan-longest-lived"} }

数据字段描述

字段名称	描述
`task`	示例的任务名称，例如 `alfworld`, `babyai`, `jericho`, `pddl`, `scienceworld`, `tool-operation`, `tool-query`, `webarena`, `webshop`。
`id`	示例的ID。
`goal`	示例的目标。
`subgoals`	示例的子目标。
`difficulty`	示例的难度，例如 `easy`, `hard`。
`additional_info`	示例的附加信息，每个示例都有其自己的附加信息。

搜集汇总

数据集介绍

背景与挑战

背景概述

AgentBoard是一个用于评估多轮LLM代理性能的数据集，包含9个多样化任务（Embodied AI、Game、Web、Tool），数据格式为json，涵盖文本生成任务。数据集提供子目标、难度等级等字段，适用于分析和评估代理在不同场景下的表现。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集