FinMCP-Bench

Name: FinMCP-Bench
Creator: maas
Published: 2026-05-24 01:53:17
License: 暂无描述

魔搭社区2026-05-24 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/tongyi_dianjin/FinMCP-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

## Fin-MCP <div align="center"> <img alt="image" src="https://raw.githubusercontent.com/aliyun/qwen-dianjin/refs/heads/master/images/dianjin_logo.png"> <p align="center"> 💜 <a href="https://tongyi.aliyun.com/dianjin">Qwen DianJin Platform</a> | 🤗 <a href="https://huggingface.co/DianJin">HuggingFace</a> | 🤖 <a href="https://github.com/aliyun/qwen-dianjin">Github</a> </p> </div> ### Introduction We propose FinMCP-Bench, a comprehensive benchmark for evaluating LLMs’ ability to invoke MCP tools in financial scenarios. It contains 613 samples across 10 main scenar- ios and 33 sub-scenarios, including real and synthetic user queries, with three sample types: single tool, multi-tool, and multi-turn, allowing evaluation across different levels of task complexity. The **10 main scenarios** and **33 sub-scenarios** are illustrated in the following figure: ![Scenarios Show](fig/category.png) #### Single-Tool Single-tool: resolved with a single tool call in one conversa- tional turn (145 samples). #### Multi-tool Multi-tool: involves multiple tool calls within a single con- versational turn, which may be sequential or parallel (249 samples). #### Multi-turn Multi-turn: spans multiple conversational turns, each poten- tially invoking one or more tools (219 samples). #### MCP-All ALL: includes all three sample types (613 samples). * The MCP tools are from Qieman. You can obtain the MCP Server URL and MCP_SCHEMA from the [qieman](https://qieman.com/) , [qieman MCP Tools](https://qieman.com/mcp/tools)

<div align="center"> <img alt="通义点金标识" src="https://raw.githubusercontent.com/aliyun/qwen-dianjin/refs/heads/master/images/dianjin_logo.png"> <p align="center"> 💜 <a href="https://tongyi.aliyun.com/dianjin">通义点金平台</a> | 🤗 <a href="https://huggingface.co/DianJin">HuggingFace</a> | 🤖 <a href="https://github.com/aliyun/qwen-dianjin">GitHub</a> </p> </div> ### 简介我们提出了FinMCP-Bench，这是一款用于评估大语言模型（Large Language Model，LLM）在金融场景下调用MCP工具能力的综合基准测试集。该基准测试集涵盖10大场景与33个子场景下的613条样本，包含真实用户查询与合成用户查询，共分为单工具调用、多工具调用、多轮对话三种样本类型，可用于评估不同任务复杂度层级下的模型表现。 **10大场景**与**33个子场景**的具体划分如下图所示： ![场景展示](fig/category.png) #### 单工具调用单工具调用：指在单轮对话中仅需调用一次工具即可完成的任务，共包含145条样本。 #### 多工具调用多工具调用：指在单轮对话中需调用多次工具的任务，调用方式可分为串行或并行，共包含249条样本。 #### 多轮对话多轮对话：指涉及多轮对话的任务，每轮对话均可调用一个或多个工具，共包含219条样本。 #### MCP-All 全类型样本集：包含上述三种样本类型，总计613条样本。 * 本次使用的MCP工具均来自且慢（Qieman）。您可通过[且慢官网](https://qieman.com/)与[且慢MCP工具页面](https://qieman.com/mcp/tools)获取MCP服务器地址与MCP_SCHEMA。

提供机构：

maas

创建时间：

2025-10-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集