库帕思金融大模型评测数据集（2024版）

Name: 库帕思金融大模型评测数据集（2024版）
Creator: corpus
Published: 2026-05-31 03:31:09
License: 暂无描述

OpenDataLab2026-05-31 更新2024-12-28 收录

下载链接：

https://opendatalab.org.cn/corpus/Corpus

下载链接

链接失效反馈

官方服务：

资源简介：

金融大模型评测数据集（2024版），对标《金融大模型应用测评指南》（T/SAIAS 019—2024），涵盖金融行业核心领域，数据来自金融机构行业实践，是金融领域大模型应用成效评测的重要抓手。评测数据集比照最高水平、最好标准，具有规模大、结构优、价值对齐等特点，符合金融领域对知识鲜活度、多样性和高密度的整体要求。聚焦“模型基础能力”，围绕计算能力、逻辑推理等6个维度，设计评测数据22000余句对。聚焦“金融安全与价值对齐能力”，围绕信息内容、社会秩序等13个维度，设计评测数据2000余句对。聚焦“金融风险控制能力”，围绕合规、市场、操作等5类金融风险，设计评测数据1000余句对。聚焦“金融业务辅助拓展能力”，围绕舆情分析、智能投研等3项业务场景，设计评测数据12000余句对。聚焦“金融专业认知能力”，围绕金融专业知识、IPO图表等7种知识类型，设计评测数据7000余句对。金融大模型评测数据集定期更新、动态迭代，1250条样例集已在Open Data Lab完成开源。

Financial Large Language Model Evaluation Dataset (2024 Edition) is aligned with Guidelines for the Application and Evaluation of Financial Large Language Models (T/SAIAS 019—2024). It covers core domains of the financial industry, with data sourced from industry practices of financial institutions, serving as a key benchmark for evaluating the effectiveness of large language model applications in the financial sector. The evaluation dataset adheres to best-in-class standards, featuring large scale, excellent structure and value alignment, and meets the overall requirements of the financial sector for knowledge freshness, diversity and high density. It focuses on 'Model Basic Capabilities', with over 22,000 sentence pairs designed across 6 dimensions including computing power and logical reasoning. It focuses on 'Financial Security and Value Alignment Capabilities', with over 2,000 sentence pairs designed across 13 dimensions including information content and social order. It focuses on 'Financial Risk Control Capabilities', with over 1,000 sentence pairs designed across 5 categories of financial risks including compliance, market and operational risks. It focuses on 'Financial Business Auxiliary Expansion Capabilities', with over 12,000 sentence pairs designed across 3 business scenarios including public opinion analysis and intelligent investment research. It focuses on 'Financial Professional Cognition Capabilities', with over 7,000 sentence pairs designed across 7 knowledge types including financial professional knowledge and IPO charts. The Financial Large Language Model Evaluation Dataset is updated regularly and dynamically iterated, and a 1,250-sample dataset has been open-sourced at the Open Data Lab.

提供机构：

corpus

创建时间：

2024-12-05

搜集汇总

数据集介绍

构建方式

数据集基于金融行业真实业务流程与标准规范构建，对标《金融大模型应用测评指南》等行业标准，由金融从业专家设计评测任务，并结合人工审核与规则校验进行多轮质量控制。

特点

覆盖金融核心业务场景，兼顾专业深度、安全合规与风险控制；任务类型多样，结构清晰，可用于系统化比较不同模型在金融场景下的能力表现。

使用方法

可作为金融大模型的离线评测基准，用于模型选型、能力对比、迭代优化与安全评估，也可用于科研机构开展金融智能系统相关研究。

背景与挑战

背景概述

该数据集是针对金融大模型评测的综合性资源，对标行业标准，覆盖金融核心领域，数据源自金融机构实践。它聚焦模型基础能力、金融安全与价值对齐、风险控制、业务辅助拓展和专业认知等多个维度，共包含超过44000句对评测数据，具有规模大、结构优和价值对齐的特点，旨在评估金融大模型的应用成效，并定期更新迭代，部分样例已开源。

常用场景

经典使用场景

用于评估不同金融大模型在专业知识理解、风险识别、合规判断和业务辅助等方面的表现差异。

实际应用

辅助金融机构进行模型选型与上线前评估，降低模型在真实业务中的合规与风险隐患。

衍生相关工作

可用于构建金融安全对齐、风险识别、专业知识推理等子任务评测或二次研究数据集。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集