five

CMMU

收藏
魔搭社区2026-05-02 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/evalscope/CMMU
下载链接
链接失效反馈
官方服务:
资源简介:
# CMMU [**📖 Paper**](https://arxiv.org/abs/2401.14011) | [**🤗 Dataset**](https://huggingface.co/datasets) | [**GitHub**](https://github.com/FlagOpen/CMMU) This repo contains the evaluation code for the paper [**CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning**](https://arxiv.org/abs/2401.14011) . We release the validation set of CMMU, you can download it from [here](https://huggingface.co/datasets/BAAI/CMMU). The test set will be hosted on the [flageval platform](https://flageval.baai.ac.cn/). Users can test by uploading their models. ## Introduction CMMU is a novel multi-modal benchmark designed to evaluate domain-specific knowledge across seven foundational subjects: math, biology, physics, chemistry, geography, politics, and history. It comprises 3603 questions, incorporating text and images, drawn from a range of Chinese exams. Spanning primary to high school levels, CMMU offers a thorough evaluation of model capabilities across different educational stages. ![](assets/example.png) ## Evaluation Results We currently evaluated 10 models on CMMU. The results are shown in the following table. | Model | Val Avg. | Test Avg. | |----------------------------|----------|-----------| | InstructBLIP-13b | 0.39 | 0.48 | | CogVLM-7b | 5.55 | 4.9 | | ShareGPT4V-7b | 7.95 | 7.63 | | mPLUG-Owl2-7b | 8.69 | 8.58 | | LLava-1.5-13b | 11.36 | 11.96 | | Qwen-VL-Chat-7b | 11.71 | 12.14 | | Intern-XComposer-7b | 18.65 | 19.07 | | Gemini-Pro | 21.58 | 22.5 | | Qwen-VL-Plus | 26.77 | 26.9 | | GPT-4V | 30.19 | 30.91 | ## Citation **BibTeX:** ```bibtex @article{he2024cmmu, title={CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning}, author={Zheqi He, Xinya Wu, Pengfei Zhou, Richeng Xuan, Guang Liu, Xi Yang, Qiannan Zhu and Hua Huang}, journal={arXiv preprint arXiv:2401.14011}, year={2024}, } ```

# CMMU [**📖 论文**](https://arxiv.org/abs/2401.14011) | [**🤗 数据集**](https://huggingface.co/datasets) | [**GitHub**](https://github.com/FlagOpen/CMMU) 本仓库包含论文《[**CMMU:中文多模态多类型问题理解与推理基准测试集**](https://arxiv.org/abs/2401.14011)》的评估代码。 我们现已发布CMMU的验证集,可从[此处](https://huggingface.co/datasets/BAAI/CMMU)下载。测试集将托管于[flageval平台](https://flageval.baai.ac.cn/),用户可上传模型进行在线测试。 ## 简介 CMMU是一款全新的多模态基准测试集,旨在评估七大基础学科的领域知识,涵盖数学、生物学、物理学、化学、地理学、政治学及历史学。该数据集共收录3603道试题,整合文本与图像素材,取材自多类中文考试。其覆盖小学至高中全学段,能够全面评估模型在不同学业阶段的能力表现。 ![示例图](assets/example.png) ## 评估结果 目前我们已在CMMU上对10款模型开展评估,结果如下表所示。 | 模型名称 | 验证集平均分 | 测试集平均分 | |----------------------------|----------|-----------| | InstructBLIP-13b | 0.39 | 0.48 | | CogVLM-7b | 5.55 | 4.9 | | ShareGPT4V-7b | 7.95 | 7.63 | | mPLUG-Owl2-7b | 8.69 | 8.58 | | LLava-1.5-13b | 11.36 | 11.96 | | Qwen-VL-Chat-7b | 11.71 | 12.14 | | Intern-XComposer-7b | 18.65 | 19.07 | | Gemini-Pro | 21.58 | 22.5 | | Qwen-VL-Plus | 26.77 | 26.9 | | GPT-4V | 30.19 | 30.91 | ## 参考文献 **BibTeX 格式:** bibtex @article{he2024cmmu, title={CMMU:中文多模态多类型问题理解与推理基准测试集}, author={何哲琦, 吴欣雅, 周鹏飞, 宣日成, 刘光, 杨曦, 朱千楠, 黄华}, journal={arXiv预印本 arXiv:2401.14011}, year={2024}, }
提供机构:
maas
创建时间:
2025-11-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作