高德MCP工具调用训练数据
收藏魔搭社区2026-01-07 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/holmeslei/Amap_MCP_FC
下载链接
链接失效反馈官方服务:
资源简介:
# 高德MCP工具调用训练数据集
## 数据集简介
本数据集是专门为训练大语言模型进行高德地图MCP(Model Context Protocol)工具调用而设计的训练数据集。数据集包含了真实的工具调用场景,涵盖了地理位置查询、路线规划、POI搜索等高德地图API的常见使用场景。
## 数据集特点
- 🎯 **真实场景**:基于高德地图API的实际使用场景构建
- 🔧 **完整流程**:包含从用户查询到工具调用再到最终回复的完整对话流程
- 📝 **Hermes格式**:采用Hermes function calling提示词模板,便于模型训练和部署
- 🌐 **中文优化**:专门针对中文地理查询场景进行优化
## 数据集结构
数据集包含以下三个核心字段:
### 1. `tools` 字段
包含可能用到的高德地图API工具定义,每个工具包括:
- 工具名称
- 工具描述
- 参数定义和类型
- 返回值说明
示例工具:
- 地理编码(地址转坐标)
- 逆地理编码(坐标转地址)
- POI搜索
- 路径规划
- 天气查询
### 2. `query` 字段
用户的原始查询内容,包括但不限于:
- 位置查询:"北京天安门的具体位置是什么?"
- 路线规划:"从上海虹桥机场到外滩怎么走?"
- POI搜索:"附近有什么好吃的餐厅?"
- 复合查询:"查一下杭州西湖附近的酒店,并规划从火车站过去的路线"
### 3. `messages` 字段
完整的对话消息列表,包含:
- **用户消息**:用户的原始查询
- **助手思考**:模型的推理过程(可选)
- **工具调用**:具体的API调用请求
- **工具返回**:API的返回结果
- **最终回复**:助手基于工具结果生成的回答
## 数据格式示例
```json
{
"tools": [
{
"name": "geocode",
"description": "将地址转换为经纬度坐标",
"parameters": {
"type": "object",
"properties": {
"address": {
"type": "string",
"description": "需要转换的地址"
},
"city": {
"type": "string",
"description": "城市名称(可选)"
}
},
"required": ["address"]
}
}
],
"query": "北京故宫的经纬度是多少?",
"messages": [
{
"role": "user",
"content": "北京故宫的经纬度是多少?"
},
{
"role": "assistant",
"content": "我来帮您查询北京故宫的经纬度坐标。",
"tool_calls": [
{
"id": "call_001",
"type": "function",
"function": {
"name": "geocode",
"arguments": "{\"address\": \"北京故宫\", \"city\": \"北京\"}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "call_001",
"content": "{\"status\": \"1\", \"geocodes\": [{\"location\": \"116.397026,39.918058\", \"formatted_address\": \"北京市东城区景山前街4号\"}]}"
},
{
"role": "assistant",
"content": "北京故宫的经纬度坐标是:\n- 经度:116.397026\n- 纬度:39.918058\n\n故宫位于北京市东城区景山前街4号,这个坐标点位于故宫的中心位置。"
}
]
}
```
## Hermes Function Calling 格式
本数据集采用Hermes格式进行function calling,这是一种广泛使用的标准化格式,特点包括:
- 使用标准的`tool_calls`字段表示工具调用
- 支持并行多工具调用
- 包含完整的调用ID追踪
- 结构化的参数传递
## 使用方法
### 安装依赖
```bash
pip install datasets
```
### 加载数据集
```python
from datasets import load_dataset
# 从Hugging Face加载数据集
dataset = load_dataset("your-username/amap-mcp-tools-dataset")
# 查看数据集信息
print(dataset)
# 访问训练集
train_data = dataset['train']
# 查看第一条数据
print(train_data[0])
```
### 数据预处理示例
```python
def format_for_training(example):
"""将数据格式化为训练格式"""
formatted_messages = []
# 添加系统提示(包含tools定义)
system_prompt = f"You have access to the following tools:\n{example['tools']}"
formatted_messages.append({"role": "system", "content": system_prompt})
# 添加对话消息
formatted_messages.extend(example['messages'])
return {"formatted_messages": formatted_messages}
# 应用预处理
processed_dataset = dataset.map(format_for_training)
```
## 训练建议
1. **模型选择**:建议使用支持function calling的基座模型,如Qwen、GLM等
2. **参数设置**:
- Learning Rate: 1e-5 到 5e-5
- Batch Size: 根据显存调整,建议4-16
- Epochs: 3-5轮
3. **评估指标**:
- 工具调用准确率
- 参数提取正确率
- 端到端任务完成率
## 数据统计
- **总样本数**:[数据集大小]
- **平均对话轮数**:3-5轮
- **工具种类**:[工具数量]种
- **覆盖场景**:地理编码、路径规划、POI搜索、天气查询等
## 许可证
本数据集采用 Apache-2.0 许可证。使用本数据集时请遵守:
- 高德地图API使用条款
- 相关数据保护法规
## 引用
如果您使用本数据集,请引用:
```bibtex
@dataset{amap_mcp_tools_2024,
title={高德MCP工具调用训练数据集},
author={Your Name},
year={2024},
publisher={Hugging Face}
}
```
**注意**:使用本数据集训练的模型在实际调用高德API时,需要有效的API密钥。
# Amap MCP Tool Calling Training Dataset
## Dataset Overview
This dataset is specifically designed for training Large Language Models (LLMs) to perform Amap Model Context Protocol (MCP) tool calling. It contains real-world tool calling scenarios, covering common usage cases of Amap API such as geographic location query, route planning, POI (Point of Interest) search and more.
## Dataset Features
- 🎯 **Real-world Scenarios**: Constructed based on actual usage scenarios of Amap API
- 🔧 **Complete Workflow**: Covers the full dialogue flow from user query, tool calling to final response
- 📝 **Hermes Format**: Adopts the Hermes function calling prompt template, facilitating model training and deployment
- 🌐 **Chinese Optimization**: Specifically optimized for Chinese geographic query scenarios
## Dataset Structure
The dataset includes the following three core fields:
### 1. `tools` Field
Contains definitions of Amap API tools that may be used, with each tool including:
- Tool name
- Tool description
- Parameter definitions and types
- Return value instructions
Example tools:
- Geocoding (address-to-coordinate conversion)
- Reverse Geocoding (coordinate-to-address conversion)
- POI Search
- Route Planning
- Weather Query
### 2. `query` Field
The user's original query content, including but not limited to:
- Location query: "What is the exact location of Tian'anmen in Beijing?"
- Route planning: "How to get from Shanghai Hongqiao Airport to the Bund?"
- POI search: "What delicious restaurants are nearby?"
- Complex query: "Check hotels near West Lake in Hangzhou, and plan the route from the railway station there"
### 3. `messages` Field
Complete dialogue message list, including:
- **User Message**: The user's original query
- **Assistant Thinking**: The model's reasoning process (optional)
- **Tool Calling**: Specific API call requests
- **Tool Return**: API return results
- **Final Reply**: The assistant's answer generated based on tool results
## Data Format Example
json
{
"tools": [
{
"name": "geocode",
"description": "Converts an address to latitude and longitude coordinates",
"parameters": {
"type": "object",
"properties": {
"address": {
"type": "string",
"description": "The address to be converted"
},
"city": {
"type": "string",
"description": "City name (optional)"
}
},
"required": ["address"]
}
}
],
"query": "What are the longitude and latitude of the Forbidden City in Beijing?",
"messages": [
{
"role": "user",
"content": "What are the longitude and latitude of the Forbidden City in Beijing?"
},
{
"role": "assistant",
"content": "Let me help you query the longitude and latitude coordinates of the Forbidden City in Beijing.",
"tool_calls": [
{
"id": "call_001",
"type": "function",
"function": {
"name": "geocode",
"arguments": "{"address": "The Forbidden City in Beijing", "city": "Beijing"}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "call_001",
"content": "{"status": "1", "geocodes": [{"location": "116.397026,39.918058", "formatted_address": "No.4 Jingshan Front Street, Dongcheng District, Beijing City"}]}"
},
{
"role": "assistant",
"content": "The longitude and latitude coordinates of the Forbidden City in Beijing are:
- Longitude: 116.397026
- Latitude: 39.918058
The Forbidden City is located at No.4 Jingshan Front Street, Dongcheng District, Beijing, and this coordinate point is at the center of the Forbidden City."
}
]
}
## Hermes Function Calling Format
This dataset uses the Hermes format for function calling, a widely adopted standardized format with the following characteristics:
- Uses the standard `tool_calls` field to represent tool calls
- Supports parallel multi-tool calling
- Includes complete call ID tracking
- Structured parameter passing
## Usage Instructions
### Install Dependencies
bash
pip install datasets
### Load the Dataset
python
from datasets import load_dataset
# Load the dataset from Hugging Face
dataset = load_dataset("your-username/amap-mcp-tools-dataset")
# View dataset information
print(dataset)
# Access the training split
train_data = dataset['train']
# View the first sample
print(train_data[0])
### Data Preprocessing Example
python
def format_for_training(example):
"""Format data into training-ready format"""
formatted_messages = []
# Add system prompt (including tools definition)
system_prompt = f"You have access to the following tools:
{example['tools']}"
formatted_messages.append({"role": "system", "content": system_prompt})
# Add dialogue messages
formatted_messages.extend(example['messages'])
return {"formatted_messages": formatted_messages}
# Apply preprocessing
processed_dataset = dataset.map(format_for_training)
## Training Recommendations
1. **Model Selection**: It is recommended to use base models that support function calling, such as Qwen, GLM, etc.
2. **Parameter Settings**:
- Learning Rate: 1e-5 to 5e-5
- Batch Size: Adjust based on available VRAM, recommended 4-16
- Epochs: 3-5 rounds
3. **Evaluation Metrics**:
- Tool Calling Accuracy
- Parameter Extraction Accuracy
- End-to-end Task Completion Rate
## Dataset Statistics
- **Total Number of Samples**: [Dataset Size]
- **Average Dialogue Turns**: 3-5 turns
- **Number of Tool Categories**: [Number of Tools]
- **Covered Scenarios**: Geocoding, route planning, POI search, weather query and more
## License
This dataset is licensed under the Apache-2.0 license. When using this dataset, please comply with:
- Amap API Terms of Service
- Relevant data protection regulations
## Citation
If you use this dataset, please cite:
bibtex
@dataset{amap_mcp_tools_2024,
title={Amap MCP Tool Calling Training Dataset},
author={Your Name},
year={2024},
publisher={Hugging Face}
}
**Note**: Models trained using this dataset require valid API keys when actually calling Amap API.
提供机构:
maas
创建时间:
2025-08-22
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



