CMMU
收藏魔搭社区2026-05-02 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/evalscope/CMMU
下载链接
链接失效反馈官方服务:
资源简介:
# CMMU
[**📖 Paper**](https://arxiv.org/abs/2401.14011) | [**🤗 Dataset**](https://huggingface.co/datasets) | [**GitHub**](https://github.com/FlagOpen/CMMU)
This repo contains the evaluation code for the paper [**CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning**](https://arxiv.org/abs/2401.14011) .
We release the validation set of CMMU, you can download it from [here](https://huggingface.co/datasets/BAAI/CMMU). The test set will be hosted on the [flageval platform](https://flageval.baai.ac.cn/). Users can test by uploading their models.
## Introduction
CMMU is a novel multi-modal benchmark designed to evaluate domain-specific knowledge across seven foundational subjects: math, biology, physics, chemistry, geography, politics, and history. It comprises 3603 questions, incorporating text and images, drawn from a range of Chinese exams. Spanning primary to high school levels, CMMU offers a thorough evaluation of model capabilities across different educational stages.

## Evaluation Results
We currently evaluated 10 models on CMMU. The results are shown in the following table.
| Model | Val Avg. | Test Avg. |
|----------------------------|----------|-----------|
| InstructBLIP-13b | 0.39 | 0.48 |
| CogVLM-7b | 5.55 | 4.9 |
| ShareGPT4V-7b | 7.95 | 7.63 |
| mPLUG-Owl2-7b | 8.69 | 8.58 |
| LLava-1.5-13b | 11.36 | 11.96 |
| Qwen-VL-Chat-7b | 11.71 | 12.14 |
| Intern-XComposer-7b | 18.65 | 19.07 |
| Gemini-Pro | 21.58 | 22.5 |
| Qwen-VL-Plus | 26.77 | 26.9 |
| GPT-4V | 30.19 | 30.91 |
## Citation
**BibTeX:**
```bibtex
@article{he2024cmmu,
title={CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning},
author={Zheqi He, Xinya Wu, Pengfei Zhou, Richeng Xuan, Guang Liu, Xi Yang, Qiannan Zhu and Hua Huang},
journal={arXiv preprint arXiv:2401.14011},
year={2024},
}
```
# CMMU
[**📖 论文**](https://arxiv.org/abs/2401.14011) | [**🤗 数据集**](https://huggingface.co/datasets) | [**GitHub**](https://github.com/FlagOpen/CMMU)
本仓库包含论文《[**CMMU:中文多模态多类型问题理解与推理基准测试集**](https://arxiv.org/abs/2401.14011)》的评估代码。
我们现已发布CMMU的验证集,可从[此处](https://huggingface.co/datasets/BAAI/CMMU)下载。测试集将托管于[flageval平台](https://flageval.baai.ac.cn/),用户可上传模型进行在线测试。
## 简介
CMMU是一款全新的多模态基准测试集,旨在评估七大基础学科的领域知识,涵盖数学、生物学、物理学、化学、地理学、政治学及历史学。该数据集共收录3603道试题,整合文本与图像素材,取材自多类中文考试。其覆盖小学至高中全学段,能够全面评估模型在不同学业阶段的能力表现。

## 评估结果
目前我们已在CMMU上对10款模型开展评估,结果如下表所示。
| 模型名称 | 验证集平均分 | 测试集平均分 |
|----------------------------|----------|-----------|
| InstructBLIP-13b | 0.39 | 0.48 |
| CogVLM-7b | 5.55 | 4.9 |
| ShareGPT4V-7b | 7.95 | 7.63 |
| mPLUG-Owl2-7b | 8.69 | 8.58 |
| LLava-1.5-13b | 11.36 | 11.96 |
| Qwen-VL-Chat-7b | 11.71 | 12.14 |
| Intern-XComposer-7b | 18.65 | 19.07 |
| Gemini-Pro | 21.58 | 22.5 |
| Qwen-VL-Plus | 26.77 | 26.9 |
| GPT-4V | 30.19 | 30.91 |
## 参考文献
**BibTeX 格式:**
bibtex
@article{he2024cmmu,
title={CMMU:中文多模态多类型问题理解与推理基准测试集},
author={何哲琦, 吴欣雅, 周鹏飞, 宣日成, 刘光, 杨曦, 朱千楠, 黄华},
journal={arXiv预印本 arXiv:2401.14011},
year={2024},
}
提供机构:
maas
创建时间:
2025-11-07



