limbic-eval-tool-use-mcp
收藏魔搭社区2025-12-05 更新2025-08-30 收录
下载链接:
https://modelscope.cn/datasets/quotientai/limbic-eval-tool-use-mcp
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Summary
The MCP Tool Call Evaluation Test Dataset is a synthetic dataset designed for evaluating and benchmarking language models' ability to correctly execute function calls in the context of Model Context Protocol (MCP) tools. This dataset contains 9,813 test examples that assess a model's proficiency in:
1. **Tool Selection**: Choosing the correct function from available tools
2. **Parameter Structure**: Providing all required parameters with correct names
3. **Parameter Values**: Supplying appropriate values that match expected data types and user intent
## Data Fields
- **available_tools**: List of available MCP tools with their schemas
- **message_history**: Conversation context leading up to the tool call, containing:
- **user_request**: The original user query that triggered the tool call
- **tool_call**: The actual tool call made by the model (may be correct or incorrect)
- **score**: Ground truth classification of the tool call quality
- **failure_reason**: Detailed explanation of what went wrong (if applicable)
## Dataset Structure
Each instance contains:
```json
{
"available_tools": [
{
"name": "function_name",
"description": "Function description",
"input_schema": {
"type": "object",
"properties": {...},
"required": [...]
}
}
],
"message_history": [
{
"role": "user|assistant",
"content": "Message content"
}
],
"score": "correct|incorrect_tool|incorrect_parameter_names|incorrect_parameter_values",
"failure_reason": "Description of failure (if any)",
}
```
## Dataset Creation
### Curation Rationale
This dataset was created to address the need for standardized evaluation of language models' tool-calling capabilities in the context of MCP (Model Context Protocol). The synthetic nature allows for controlled testing scenarios and comprehensive coverage of various failure modes.
### Source Data
#### Initial Data Collection and Normalization
The dataset was synthetically generated using a combination of:
- Real MCP server definitions from the Smithery registry
- Automated tool call generation with intentional errors
- Manual validation and quality control
### Scores
Each example was automatically labeled based on predefined criteria:
- **correct**: Tool call matches available tools and parameters exactly and achieves user request
- **incorrect_tool**: Function name doesn't exist in available tools or incorrect function was chosen
- **incorrect_parameter_names**: Correct function was chosen but parameter names are wrong
- **incorrect_parameter_values**: Function and parameters are correct but values are inappropriate
```bibtex
@dataset{mcp_tool_call_eval_test,
title={MCP Tool Call Evaluation Test Dataset},
author={QuotientAI},
year={2025},
url={https://huggingface.co/datasets/quotientai/limbic-eval-tool-use-mcp}
}
```
# 数据集概述
MCP工具调用评估测试数据集是一款合成数据集,旨在评估与基准测试大语言模型(Large Language Model)在模型上下文协议(Model Context Protocol,MCP)工具场景下正确执行函数调用的能力。本数据集包含9813条测试样本,用于评估模型在以下三方面的熟练度:
1. **工具选择**:从可用工具中选取正确的函数
2. **参数结构**:提供所有必填参数且参数名称无误
3. **参数值**:提供符合预期数据类型与用户意图的合理参数值
## 数据字段
- **available_tools**:包含各MCP工具及其架构信息的可用工具列表
- **message_history**:触发工具调用前的对话上下文,包含:
- **user_request**:触发工具调用的原始用户查询
- **tool_call**:模型实际生成的工具调用(可能正确或存在错误)
- **score**:工具调用质量的真实标注分类
- **failure_reason**:工具调用出错时的详细错误说明(如适用)
## 数据集结构
每个样本包含如下格式内容:
json
{
"available_tools": [
{
"name": "function_name",
"description": "Function description",
"input_schema": {
"type": "object",
"properties": {...},
"required": [...]
}
}
],
"message_history": [
{
"role": "user|assistant",
"content": "Message content"
}
],
"score": "correct|incorrect_tool|incorrect_parameter_names|incorrect_parameter_values",
"failure_reason": "Description of failure (if any)"
}
## 数据集构建
### 筛选依据
本数据集的构建旨在满足对大语言模型在MCP工具场景下的工具调用能力进行标准化评估的需求。合成数据集的特性可实现可控的测试场景,并全面覆盖各类错误模式。
### 源数据
#### 初始数据收集与标准化处理
本数据集通过以下组合方式合成生成:
- 取自Smithery注册表的真实MCP服务器定义
- 带有故意错误的自动化工具调用生成流程
- 人工验证与质量管控环节
### 评分规则
每条样本均基于预设规则自动标注:
- **correct**:工具调用与可用工具及参数完全匹配,且可满足用户请求
- **incorrect_tool**:函数名不存在于可用工具列表中,或选取了错误的函数
- **incorrect_parameter_names**:选取了正确的函数,但参数名称有误
- **incorrect_parameter_values**:函数与参数名称均正确,但参数值不合理
bibtex
@dataset{mcp_tool_call_eval_test,
title={MCP Tool Call Evaluation Test Dataset},
author={QuotientAI},
year={2025},
url={https://huggingface.co/datasets/quotientai/limbic-eval-tool-use-mcp}
}
提供机构:
maas
创建时间:
2025-07-28



