ICIP/LiveMCPBench

Name: ICIP/LiveMCPBench
Creator: ICIP
Published: 2025-08-07 03:05:15
License: 暂无描述

Hugging Face2025-08-07 更新2025-08-09 收录

下载链接：

https://hf-mirror.com/datasets/ICIP/LiveMCPBench

下载链接

链接失效反馈

官方服务：

资源简介：

LiveMCPBench是第一个旨在在多样化的模型上下文协议（MCP）服务器上大规模评估LLM代理的全面基准。它包括95个基于MCP生态系统的现实世界任务，要求代理在复杂、工具丰富和动态的日常场景中有效地使用各种工具。为了支持可扩展和可复制的评估，LiveMCPBench配备了LiveMCPTool（一个包含70个MCP服务器和527个工具的集合）和LiveMCPEval（一个LLM作为裁判的框架，可实现自动和自适应评估）。该基准为在现实、工具丰富和动态的MCP环境中对LLM代理进行基准测试提供了一个统一的框架，为代理能力的可扩展和可复制研究奠定了坚实基础。

LiveMCPBench is the first comprehensive benchmark designed to evaluate LLM agents at scale across diverse Model Context Protocol (MCP) servers. It comprises 95 real-world tasks grounded in the MCP ecosystem, challenging agents to effectively use various tools in daily scenarios within complex, tool-rich, and dynamic environments. To support scalable and reproducible evaluation, LiveMCPBench is complemented by LiveMCPTool (a diverse collection of 70 MCP servers and 527 tools) and LiveMCPEval (an LLM-as-a-Judge framework that enables automated and adaptive evaluation). The benchmark offers a unified framework for benchmarking LLM agents in realistic, tool-rich, and dynamic MCP environments, laying a solid foundation for scalable and reproducible research on agent capabilities.

提供机构：

ICIP

5,000+

优质数据集

54 个

任务类型

进入经典数据集