FCMBench-Data
收藏魔搭社区2026-05-09 更新2026-05-10 收录
下载链接:
https://modelscope.cn/datasets/QFIN/FCMBench-Data
下载链接
链接失效反馈官方服务:
资源简介:

**FCMBench** is a multimodal benchmark for credit-risk–oriented workflows. It aims to provide a standard playground to promote collaborative development between academia and industry and provides standardized datasets, prompts, and evaluation scripts across multiple tracks (image, video, speech, agents, etc.)
<p align="center">
💻 <a href="https://github.com/QFIN-tech/FCMBench/tree/main"><b>GitHub</b></a> | 🤗 <a href="https://huggingface.co/datasets/QFIN/FCMBench-Data"><b>Hugging Face</b></a> | 📑 <a href="https://arxiv.org/abs/2601.00150"><b>FCMBench Paper</b></a> | 📑 <a href="https://arxiv.org/abs/2604.25186"><b>FCMBench-Video Paper</b></a> | 🏆 <a href="https://qfin-tech.github.io/FCMBench"><b>Leaderboard</b></a> | 🌐 <a href="./README_cn.md"><b>简体中文</b></a>
</p>
## 🔥 News
- 【**2026. 04. 29**】🎬 We released **FCMBench-Video**, a benchmark for document-video intelligence. Built from 495 captured atomic videos and composed into 1,200 long-form videos with 11,322 QA instances across 28 document types (bilingual CN/EN). Paper: [arXiv 2604.25186](https://arxiv.org/abs/2604.25186).
- 【**2026. 03. 16**】✨ We released **FCMBench-V1.1**. This version adds English document images and corresponding QA pairs, expands the covered document types to 26, and increases the dataset to 5,198 images and 13,806 QA samples.
- 【**2026. 01. 01**】We are proud to launch **FCMBench-V1.0**, which covers 18 core certificate types, including 4,043 privacy-compliant images and 8,446 QA samples. It involves 3 types of Perception tasks and 4 types of Reasoning tasks, which are cross-referenced with 10 categories of robustness inferences. All the tasks and inferences are derived from real-world critical scenarios.
> **Status:** Public release (v1.1).<br>
> **Maintainers:** [奇富科技 / Qfin Holdings](https://github.com/QFIN-tech)<br>
> **Contact:** [yangyehuisw@126.com]
---
## Tracks Overview
| Entry | Inputs | Outputs | Evaluation Script | Leaderboard | Paper | Sample Data |
|---|---|---|---|---|---|---|
| [Vision-Language Track](vision_language) | document images + text prompts (JSONL, one sample per line) | text responses (JSONL, one sample per line) | [evaluation.py](vision_language/evaluation.py) | [Leaderboard](https://qfin-tech.github.io/FCMBench) | [arXiv 2601.00150](https://arxiv.org/abs/2601.00150) | [Examples](https://qfin-tech.github.io/FCMBench/Examples.html) |
| [Video Understanding Track](video_understanding) | document videos + text prompts (JSONL) | text responses (JSONL) | [benchmark_eval.py](video_understanding/benchmark_eval.py) | via [submission](video_understanding/README.md#leaderboard) | [arXiv 2604.25186](https://arxiv.org/abs/2604.25186) | see [README](video_understanding/README.md) |
---
### 1) Vision-Language Track (✅ Available)
Image-based financial document understanding.
#### Sample Data
Preview sample images and QA examples on the [Examples page](https://qfin-tech.github.io/FCMBench/Examples.html).
#### Reference Model Demo
We also provide access to an interactive demo of our Qfin-VL-Instruct model, which achieves strong performance on FCMBench.
If you are interested in trying the Gradio demo, please contact [yangyehuisw@126.com] with the following information:
- Name
- Affiliation / Organization
- Intended use (e.g., research exploration, benchmarking reference)
- Contact email
Access will be granted on a case-by-case basis.
---
### 2) Video Understanding Track (🎬 Available)
Document-video intelligence benchmark covering document perception, temporal grounding, and evidence-grounded reasoning under realistic handheld capture conditions. Built from 495 captured atomic videos composed into 1,200 long-form videos (20s/40s/60s duration tiers) with 11,322 expert-annotated QA instances across 28 document types in bilingual Chinese/English settings. See the [paper](https://arxiv.org/abs/2604.25186) for full benchmark details and evaluation results on nine Video-MLLMs.
#### Sample Data
Please refer to the [Video Understanding track README](video_understanding/README.md) for the full data composition, instruction file descriptions, and quickstart guide. A stratified 10% subset with ground-truth (`FCMBench-Video_v1.0_small.jsonl`) is available for self-evaluation.
#### Reference Model Demo
*(TBD)*
---
### 3) Speech Understanding & Generation Track (🕒 Coming Soon)
### 4) Multi-step / Agentic Track (🕒 Coming Soon)
## Citation
**FCMBench (Vision-Language Track):**
```
@misc{yang2026fcmbenchlargescalefinancialcredit,
title={FCMBench: The First Large-scale Financial Credit Multimodal Benchmark for Real-world Applications},
author={Yehui Yang and Dalu Yang and Fangxin Shang and Wenshuo Zhou and Jie Ren and Yifan Liu and Haojun Fei and Qing Yang and Yanwu Xu and Tao Chen},
year={2026},
eprint={2601.00150},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.00150},
}
```
**FCMBench-Video (Video Understanding Track):**
```
@misc{cui2026fcmbenchvideobenchmarkingdocumentvideo,
title={FCMBench-Video: Benchmarking Document Video Intelligence},
author={Runze Cui and Fangxin Shang and Yehui Yang and Qing Yang and Tao Chen},
year={2026},
eprint={2604.25186},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.25186},
}
```
## Star History
[](https://www.star-history.com/#QFIN-tech/FCMBench&type=date&legend=top-left)

**FCMBench** 是面向信贷风险相关工作流的多模态基准测试集。其旨在打造标准化试验平台,以推动学术界与工业界的协同研发,并覆盖图像、视频、语音、AI智能体(AI Agent)等多个赛道,提供标准化数据集、提示词(Prompt)与评估脚本。
<p align="center">
💻 <a href="https://github.com/QFIN-tech/FCMBench/tree/main"><b>GitHub</b></a> | 🤗 <a href="https://huggingface.co/datasets/QFIN/FCMBench-Data"><b>Hugging Face</b></a> | 📑 <a href="https://arxiv.org/abs/2601.00150"><b>FCMBench 论文</b></a> | 📑 <a href="https://arxiv.org/abs/2604.25186"><b>FCMBench-Video 论文</b></a> | 🏆 <a href="https://qfin-tech.github.io/FCMBench"><b>排行榜</b></a> | 🌐 <a href="./README_cn.md"><b>简体中文</b></a>
</p>
## 🔥 最新动态
- 【**2026. 04. 29**】🎬 我们发布了**FCMBench-Video**,一款面向文档视频智能的基准测试集。该数据集由495条采集的原子视频合成1200条长视频,涵盖28种文档类型的11322个问答(QA)实例,支持中英双语。相关论文:[arXiv 2604.25186](https://arxiv.org/abs/2604.25186)。
- 【**2026. 03. 16**】✨ 我们发布了**FCMBench-V1.1**。该版本新增英文文档图像及对应问答对,覆盖文档类型拓展至26种,数据集规模提升至5198张图像与13806个问答样本。
- 【**2026. 01. 01**】我们荣幸推出**FCMBench-V1.0**,涵盖18种核心凭证类型,包含4043张符合隐私规范的图像与8446个问答样本。其涉及3类感知任务与4类推理任务,并与10类鲁棒性推理交叉关联。所有任务与推理均源自真实世界的关键业务场景。
> **项目状态:** 正式发布(v1.1版本)。<br>
> **维护方:** [奇富科技 / Qfin Holdings](https://github.com/QFIN-tech)<br>
> **联系方式:** [yangyehuisw@126.com]
---
## 赛道总览
| 赛道条目 | 输入数据 | 输出数据 | 评估脚本 | 排行榜 | 相关论文 | 示例数据 |
|---|---|---|---|---|---|---|
| [视觉语言赛道](vision_language) | 文档图像 + 文本提示词(JSONL格式,每行一个样本) | 文本回复(JSONL格式,每行一个样本) | [evaluation.py](vision_language/evaluation.py) | [排行榜](https://qfin-tech.github.io/FCMBench) | [arXiv 2601.00150](https://arxiv.org/abs/2601.00150) | [示例页面](https://qfin-tech.github.io/FCMBench/Examples.html) |
| [视频理解赛道](video_understanding) | 文档视频 + 文本提示词(JSONL格式) | 文本回复(JSONL格式) | [benchmark_eval.py](video_understanding/benchmark_eval.py) | 需通过[提交流程](video_understanding/README.md#leaderboard)参与 | [arXiv 2604.25186](https://arxiv.org/abs/2604.25186) | 详见[README文档](video_understanding/README.md) |
---
### 1) 视觉语言赛道(✅ 已上线)
基于图像的金融文档理解任务。
#### 示例数据
可在[示例页面](https://qfin-tech.github.io/FCMBench/Examples.html)预览示例图像与问答样例。
#### 参考模型演示
我们还提供了Qfin-VL-Instruct模型的交互式演示入口,该模型在FCMBench上表现优异。若您希望体验Gradio演示,请通过以下信息联系[yangyehuisw@126.com]:
- 姓名
- 所属机构/组织
- 用途说明(例如:研究探索、基准测试参考)
- 联系邮箱
申请将根据具体情况逐一审核通过。
---
### 2) 视频理解赛道(🎬 已上线)
面向真实手持拍摄场景下的文档感知、时序定位与证据驱动推理的文档视频智能基准测试集。该数据集由495条采集的原子视频合成1200条长视频(分为20秒、40秒、60秒三个时长档位),涵盖28种文档类型的11322个专家标注问答实例,支持中英双语设置。完整基准详情与9款视频多模态大语言模型的评估结果详见[相关论文](https://arxiv.org/abs/2604.25186)。
#### 示例数据
完整的数据构成、提示文件说明与快速入门指南请参阅[视频理解赛道README文档](video_understanding/README.md)。同时提供带标注的分层10%子集数据(`FCMBench-Video_v1.0_small.jsonl`)供自主评估使用。
#### 参考模型演示
*(待开发)*
---
### 3) 语音理解与生成赛道(🕒 即将上线)
### 4) 多步任务 / AI智能体赛道(🕒 即将上线)
---
## 引用规范
**FCMBench(视觉语言赛道):**
@misc{yang2026fcmbenchlargescalefinancialcredit,
title={FCMBench: The First Large-scale Financial Credit Multimodal Benchmark for Real-world Applications},
author={Yehui Yang and Dalu Yang and Fangxin Shang and Wenshuo Zhou and Jie Ren and Yifan Liu and Haojun Fei and Qing Yang and Yanwu Xu and Tao Chen},
year={2026},
eprint={2601.00150},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.00150},
}
**FCMBench-Video(视频理解赛道):**
@misc{cui2026fcmbenchvideobenchmarkingdocumentvideo,
title={FCMBench-Video: Benchmarking Document Video Intelligence},
author={Runze Cui and Fangxin Shang and Yehui Yang and Qing Yang and Tao Chen},
year={2026},
eprint={2604.25186},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.25186},
}
---
## 星标历史
[](https://www.star-history.com/#QFIN-tech/FCMBench&type=date&legend=top-left)
提供机构:
maas
创建时间:
2026-03-16



