ScaleAI/MCP-Atlas
收藏Hugging Face2025-12-19 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/ScaleAI/MCP-Atlas
下载链接
链接失效反馈官方服务:
资源简介:
MCP Atlas是一个大规模基准测试数据集,用于评估工具使用能力。它包含36个真实的MCP服务器和220个工具,任务设计用于评估在现实、多步骤工作流中的工具使用能力。公开版本包含500个样本任务,这些任务遵循完整基准测试的分布,每个任务需要3-6个工具调用。数据集的结构包括任务ID、启用的工具、提示、GTFA声明和轨迹。数据集可用于评估模型响应的覆盖率和轨迹诊断。
MCP Atlas is a large-scale benchmark for evaluating tool-use competency, comprising 36 real MCP servers and 220 tools. Tasks are designed to assess tool-use competency in realistic, multi-step workflows. The public release is a subset of 500 sample tasks from the MCP Atlas Benchmark dataset, closely following the distributions of the full benchmark, with 3-6 tool calls per task. The dataset structure includes task ID, enabled tools, prompt, GTFA claims, and trajectory. It can be used to evaluate model response coverage and trajectory diagnostics.
提供机构:
ScaleAI



