FinMCP-Bench
收藏魔搭社区2026-05-24 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/tongyi_dianjin/FinMCP-Bench
下载链接
链接失效反馈官方服务:
资源简介:
## Fin-MCP
<div align="center">
<img alt="image" src="https://raw.githubusercontent.com/aliyun/qwen-dianjin/refs/heads/master/images/dianjin_logo.png">
<p align="center">
💜 <a href="https://tongyi.aliyun.com/dianjin">Qwen DianJin Platform</a> |
🤗 <a href="https://huggingface.co/DianJin">HuggingFace</a> |
🤖 <a href="https://github.com/aliyun/qwen-dianjin">Github</a>
</p>
</div>
### Introduction
We propose FinMCP-Bench, a comprehensive benchmark for
evaluating LLMs’ ability to invoke MCP tools in financial
scenarios. It contains 613 samples across 10 main scenar-
ios and 33 sub-scenarios, including real and synthetic user
queries, with three sample types: single tool, multi-tool, and multi-turn, allowing evaluation across different levels of task complexity.
The **10 main scenarios** and **33 sub-scenarios** are illustrated in the following figure:

#### Single-Tool
Single-tool: resolved with a single tool call in one conversa-
tional turn (145 samples).
#### Multi-tool
Multi-tool: involves multiple tool calls within a single con-
versational turn, which may be sequential or parallel (249
samples).
#### Multi-turn
Multi-turn: spans multiple conversational turns, each poten-
tially invoking one or more tools (219 samples).
#### MCP-All
ALL: includes all three sample types (613 samples).
* The MCP tools are from Qieman. You can obtain the MCP Server URL and MCP_SCHEMA from the [qieman](https://qieman.com/) , [qieman MCP Tools](https://qieman.com/mcp/tools)
<div align="center">
<img alt="通义点金标识" src="https://raw.githubusercontent.com/aliyun/qwen-dianjin/refs/heads/master/images/dianjin_logo.png">
<p align="center">
💜 <a href="https://tongyi.aliyun.com/dianjin">通义点金平台</a> |
🤗 <a href="https://huggingface.co/DianJin">HuggingFace</a> |
🤖 <a href="https://github.com/aliyun/qwen-dianjin">GitHub</a>
</p>
</div>
### 简介
我们提出了FinMCP-Bench,这是一款用于评估大语言模型(Large Language Model,LLM)在金融场景下调用MCP工具能力的综合基准测试集。该基准测试集涵盖10大场景与33个子场景下的613条样本,包含真实用户查询与合成用户查询,共分为单工具调用、多工具调用、多轮对话三种样本类型,可用于评估不同任务复杂度层级下的模型表现。
**10大场景**与**33个子场景**的具体划分如下图所示:

#### 单工具调用
单工具调用:指在单轮对话中仅需调用一次工具即可完成的任务,共包含145条样本。
#### 多工具调用
多工具调用:指在单轮对话中需调用多次工具的任务,调用方式可分为串行或并行,共包含249条样本。
#### 多轮对话
多轮对话:指涉及多轮对话的任务,每轮对话均可调用一个或多个工具,共包含219条样本。
#### MCP-All
全类型样本集:包含上述三种样本类型,总计613条样本。
* 本次使用的MCP工具均来自且慢(Qieman)。您可通过[且慢官网](https://qieman.com/)与[且慢MCP工具页面](https://qieman.com/mcp/tools)获取MCP服务器地址与MCP_SCHEMA。
提供机构:
maas
创建时间:
2025-10-14



