高德MCP工具调用训练数据

Name: 高德MCP工具调用训练数据
Creator: maas
Published: 2026-01-07 10:24:37
License: 暂无描述

魔搭社区2026-01-07 更新2025-08-23 收录

下载链接：

https://modelscope.cn/datasets/holmeslei/Amap_MCP_FC

下载链接

链接失效反馈

官方服务：

资源简介：

# 高德MCP工具调用训练数据集 ## 数据集简介本数据集是专门为训练大语言模型进行高德地图MCP（Model Context Protocol）工具调用而设计的训练数据集。数据集包含了真实的工具调用场景，涵盖了地理位置查询、路线规划、POI搜索等高德地图API的常见使用场景。 ## 数据集特点 - 🎯 **真实场景**：基于高德地图API的实际使用场景构建 - 🔧 **完整流程**：包含从用户查询到工具调用再到最终回复的完整对话流程 - 📝 **Hermes格式**：采用Hermes function calling提示词模板，便于模型训练和部署 - 🌐 **中文优化**：专门针对中文地理查询场景进行优化 ## 数据集结构数据集包含以下三个核心字段： ### 1. `tools` 字段包含可能用到的高德地图API工具定义，每个工具包括： - 工具名称 - 工具描述 - 参数定义和类型 - 返回值说明示例工具： - 地理编码（地址转坐标） - 逆地理编码（坐标转地址） - POI搜索 - 路径规划 - 天气查询 ### 2. `query` 字段用户的原始查询内容，包括但不限于： - 位置查询："北京天安门的具体位置是什么？" - 路线规划："从上海虹桥机场到外滩怎么走？" - POI搜索："附近有什么好吃的餐厅？" - 复合查询："查一下杭州西湖附近的酒店，并规划从火车站过去的路线" ### 3. `messages` 字段完整的对话消息列表，包含： - **用户消息**：用户的原始查询 - **助手思考**：模型的推理过程（可选） - **工具调用**：具体的API调用请求 - **工具返回**：API的返回结果 - **最终回复**：助手基于工具结果生成的回答 ## 数据格式示例 ```json { "tools": [ { "name": "geocode", "description": "将地址转换为经纬度坐标", "parameters": { "type": "object", "properties": { "address": { "type": "string", "description": "需要转换的地址" }, "city": { "type": "string", "description": "城市名称（可选）" } }, "required": ["address"] } } ], "query": "北京故宫的经纬度是多少？", "messages": [ { "role": "user", "content": "北京故宫的经纬度是多少？" }, { "role": "assistant", "content": "我来帮您查询北京故宫的经纬度坐标。", "tool_calls": [ { "id": "call_001", "type": "function", "function": { "name": "geocode", "arguments": "{\"address\": \"北京故宫\", \"city\": \"北京\"}" } } ] }, { "role": "tool", "tool_call_id": "call_001", "content": "{\"status\": \"1\", \"geocodes\": [{\"location\": \"116.397026,39.918058\", \"formatted_address\": \"北京市东城区景山前街4号\"}]}" }, { "role": "assistant", "content": "北京故宫的经纬度坐标是：\n- 经度：116.397026\n- 纬度：39.918058\n\n故宫位于北京市东城区景山前街4号，这个坐标点位于故宫的中心位置。" } ] } ``` ## Hermes Function Calling 格式本数据集采用Hermes格式进行function calling，这是一种广泛使用的标准化格式，特点包括： - 使用标准的`tool_calls`字段表示工具调用 - 支持并行多工具调用 - 包含完整的调用ID追踪 - 结构化的参数传递 ## 使用方法 ### 安装依赖 ```bash pip install datasets ``` ### 加载数据集 ```python from datasets import load_dataset # 从Hugging Face加载数据集 dataset = load_dataset("your-username/amap-mcp-tools-dataset") # 查看数据集信息 print(dataset) # 访问训练集 train_data = dataset['train'] # 查看第一条数据 print(train_data[0]) ``` ### 数据预处理示例 ```python def format_for_training(example): """将数据格式化为训练格式""" formatted_messages = [] # 添加系统提示（包含tools定义） system_prompt = f"You have access to the following tools:\n{example['tools']}" formatted_messages.append({"role": "system", "content": system_prompt}) # 添加对话消息 formatted_messages.extend(example['messages']) return {"formatted_messages": formatted_messages} # 应用预处理 processed_dataset = dataset.map(format_for_training) ``` ## 训练建议 1. **模型选择**：建议使用支持function calling的基座模型，如Qwen、GLM等 2. **参数设置**： - Learning Rate: 1e-5 到 5e-5 - Batch Size: 根据显存调整，建议4-16 - Epochs: 3-5轮 3. **评估指标**： - 工具调用准确率 - 参数提取正确率 - 端到端任务完成率 ## 数据统计 - **总样本数**：[数据集大小] - **平均对话轮数**：3-5轮 - **工具种类**：[工具数量]种 - **覆盖场景**：地理编码、路径规划、POI搜索、天气查询等 ## 许可证本数据集采用 Apache-2.0 许可证。使用本数据集时请遵守： - 高德地图API使用条款 - 相关数据保护法规 ## 引用如果您使用本数据集，请引用： ```bibtex @dataset{amap_mcp_tools_2024, title={高德MCP工具调用训练数据集}, author={Your Name}, year={2024}, publisher={Hugging Face} } ``` **注意**：使用本数据集训练的模型在实际调用高德API时，需要有效的API密钥。

# Amap MCP Tool Calling Training Dataset ## Dataset Overview This dataset is specifically designed for training Large Language Models (LLMs) to perform Amap Model Context Protocol (MCP) tool calling. It contains real-world tool calling scenarios, covering common usage cases of Amap API such as geographic location query, route planning, POI (Point of Interest) search and more. ## Dataset Features - 🎯 **Real-world Scenarios**: Constructed based on actual usage scenarios of Amap API - 🔧 **Complete Workflow**: Covers the full dialogue flow from user query, tool calling to final response - 📝 **Hermes Format**: Adopts the Hermes function calling prompt template, facilitating model training and deployment - 🌐 **Chinese Optimization**: Specifically optimized for Chinese geographic query scenarios ## Dataset Structure The dataset includes the following three core fields: ### 1. `tools` Field Contains definitions of Amap API tools that may be used, with each tool including: - Tool name - Tool description - Parameter definitions and types - Return value instructions Example tools: - Geocoding (address-to-coordinate conversion) - Reverse Geocoding (coordinate-to-address conversion) - POI Search - Route Planning - Weather Query ### 2. `query` Field The user's original query content, including but not limited to: - Location query: "What is the exact location of Tian'anmen in Beijing?" - Route planning: "How to get from Shanghai Hongqiao Airport to the Bund?" - POI search: "What delicious restaurants are nearby?" - Complex query: "Check hotels near West Lake in Hangzhou, and plan the route from the railway station there" ### 3. `messages` Field Complete dialogue message list, including: - **User Message**: The user's original query - **Assistant Thinking**: The model's reasoning process (optional) - **Tool Calling**: Specific API call requests - **Tool Return**: API return results - **Final Reply**: The assistant's answer generated based on tool results ## Data Format Example json { "tools": [ { "name": "geocode", "description": "Converts an address to latitude and longitude coordinates", "parameters": { "type": "object", "properties": { "address": { "type": "string", "description": "The address to be converted" }, "city": { "type": "string", "description": "City name (optional)" } }, "required": ["address"] } } ], "query": "What are the longitude and latitude of the Forbidden City in Beijing?", "messages": [ { "role": "user", "content": "What are the longitude and latitude of the Forbidden City in Beijing?" }, { "role": "assistant", "content": "Let me help you query the longitude and latitude coordinates of the Forbidden City in Beijing.", "tool_calls": [ { "id": "call_001", "type": "function", "function": { "name": "geocode", "arguments": "{"address": "The Forbidden City in Beijing", "city": "Beijing"}" } } ] }, { "role": "tool", "tool_call_id": "call_001", "content": "{"status": "1", "geocodes": [{"location": "116.397026,39.918058", "formatted_address": "No.4 Jingshan Front Street, Dongcheng District, Beijing City"}]}" }, { "role": "assistant", "content": "The longitude and latitude coordinates of the Forbidden City in Beijing are: - Longitude: 116.397026 - Latitude: 39.918058 The Forbidden City is located at No.4 Jingshan Front Street, Dongcheng District, Beijing, and this coordinate point is at the center of the Forbidden City." } ] } ## Hermes Function Calling Format This dataset uses the Hermes format for function calling, a widely adopted standardized format with the following characteristics: - Uses the standard `tool_calls` field to represent tool calls - Supports parallel multi-tool calling - Includes complete call ID tracking - Structured parameter passing ## Usage Instructions ### Install Dependencies bash pip install datasets ### Load the Dataset python from datasets import load_dataset # Load the dataset from Hugging Face dataset = load_dataset("your-username/amap-mcp-tools-dataset") # View dataset information print(dataset) # Access the training split train_data = dataset['train'] # View the first sample print(train_data[0]) ### Data Preprocessing Example python def format_for_training(example): """Format data into training-ready format""" formatted_messages = [] # Add system prompt (including tools definition) system_prompt = f"You have access to the following tools: {example['tools']}" formatted_messages.append({"role": "system", "content": system_prompt}) # Add dialogue messages formatted_messages.extend(example['messages']) return {"formatted_messages": formatted_messages} # Apply preprocessing processed_dataset = dataset.map(format_for_training) ## Training Recommendations 1. **Model Selection**: It is recommended to use base models that support function calling, such as Qwen, GLM, etc. 2. **Parameter Settings**: - Learning Rate: 1e-5 to 5e-5 - Batch Size: Adjust based on available VRAM, recommended 4-16 - Epochs: 3-5 rounds 3. **Evaluation Metrics**: - Tool Calling Accuracy - Parameter Extraction Accuracy - End-to-end Task Completion Rate ## Dataset Statistics - **Total Number of Samples**: [Dataset Size] - **Average Dialogue Turns**: 3-5 turns - **Number of Tool Categories**: [Number of Tools] - **Covered Scenarios**: Geocoding, route planning, POI search, weather query and more ## License This dataset is licensed under the Apache-2.0 license. When using this dataset, please comply with: - Amap API Terms of Service - Relevant data protection regulations ## Citation If you use this dataset, please cite: bibtex @dataset{amap_mcp_tools_2024, title={Amap MCP Tool Calling Training Dataset}, author={Your Name}, year={2024}, publisher={Hugging Face} } **Note**: Models trained using this dataset require valid API keys when actually calling Amap API.

提供机构：

maas

创建时间：

2025-08-22

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成