KAgentBench

arXiv2025-09-30 收录

下载链接：

https://github.com/kwaikeg/kwaiagents

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个旨在评估智能体能力并使用特定指标进行评价的基准测试集，它由一系列查询、工具、模板和记忆元素构建而成。此外，每个查询配备了五种系统提示模板，并包含了不同类型的记忆元素。该基准测试集融合了事实性查询和时间感知查询，规模涵盖了614个独特的工具和来自不同领域的43,099个查询。其任务是对大型语言模型的智能体能力进行基准测试。

This dataset is a benchmark developed to assess AI Agent capabilities with specified evaluation metrics, constructed from a corpus of queries, tools, templates and memory elements. Each query is paired with five system prompt templates and incorporates diverse types of memory elements. This benchmark integrates both factual queries and time-aware queries, encompassing 614 unique tools and 43,099 queries originating from various domains. Its core purpose is to benchmark the agent capabilities of large language models.

搜集汇总

数据集介绍

背景与挑战

背景概述

KAgentBench是一个包含3,000多条人工编辑的自动化评估数据集，用于测试代理能力，评估维度包括规划、工具使用、反思、总结和分析。该数据集是KwaiAgents系列工作的一部分，由Kuaishou Technology的KwaiKEG开源。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集