下载链接：

https://modelscope.cn/datasets/evalscope/CMMU

下载链接

链接失效反馈

官方服务：

资源简介：

# CMMU [**📖 Paper**](https://arxiv.org/abs/2401.14011) | [**🤗 Dataset**](https://huggingface.co/datasets) | [**GitHub**](https://github.com/FlagOpen/CMMU) This repo contains the evaluation code for the paper [**CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning**](https://arxiv.org/abs/2401.14011) . We release the validation set of CMMU, you can download it from [here](https://huggingface.co/datasets/BAAI/CMMU). The test set will be hosted on the [flageval platform](https://flageval.baai.ac.cn/). Users can test by uploading their models. ## Introduction CMMU is a novel multi-modal benchmark designed to evaluate domain-specific knowledge across seven foundational subjects: math, biology, physics, chemistry, geography, politics, and history. It comprises 3603 questions, incorporating text and images, drawn from a range of Chinese exams. Spanning primary to high school levels, CMMU offers a thorough evaluation of model capabilities across different educational stages. ![](assets/example.png) ## Evaluation Results We currently evaluated 10 models on CMMU. The results are shown in the following table. | Model | Val Avg. | Test Avg. | |----------------------------|----------|-----------| | InstructBLIP-13b | 0.39 | 0.48 | | CogVLM-7b | 5.55 | 4.9 | | ShareGPT4V-7b | 7.95 | 7.63 | | mPLUG-Owl2-7b | 8.69 | 8.58 | | LLava-1.5-13b | 11.36 | 11.96 | | Qwen-VL-Chat-7b | 11.71 | 12.14 | | Intern-XComposer-7b | 18.65 | 19.07 | | Gemini-Pro | 21.58 | 22.5 | | Qwen-VL-Plus | 26.77 | 26.9 | | GPT-4V | 30.19 | 30.91 | ## Citation **BibTeX:** ```bibtex @article{he2024cmmu, title={CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning}, author={Zheqi He, Xinya Wu, Pengfei Zhou, Richeng Xuan, Guang Liu, Xi Yang, Qiannan Zhu and Hua Huang}, journal={arXiv preprint arXiv:2401.14011}, year={2024}, } ```

# CMMU [**📖 论文**](https://arxiv.org/abs/2401.14011) | [**🤗 数据集**](https://huggingface.co/datasets) | [**GitHub**](https://github.com/FlagOpen/CMMU) 本仓库包含论文《[**CMMU：中文多模态多类型问题理解与推理基准测试集**](https://arxiv.org/abs/2401.14011)》的评估代码。我们现已发布CMMU的验证集，可从[此处](https://huggingface.co/datasets/BAAI/CMMU)下载。测试集将托管于[flageval平台](https://flageval.baai.ac.cn/)，用户可上传模型进行在线测试。 ## 简介 CMMU是一款全新的多模态基准测试集，旨在评估七大基础学科的领域知识，涵盖数学、生物学、物理学、化学、地理学、政治学及历史学。该数据集共收录3603道试题，整合文本与图像素材，取材自多类中文考试。其覆盖小学至高中全学段，能够全面评估模型在不同学业阶段的能力表现。 ![示例图](assets/example.png) ## 评估结果目前我们已在CMMU上对10款模型开展评估，结果如下表所示。 | 模型名称 | 验证集平均分 | 测试集平均分 | |----------------------------|----------|-----------| | InstructBLIP-13b | 0.39 | 0.48 | | CogVLM-7b | 5.55 | 4.9 | | ShareGPT4V-7b | 7.95 | 7.63 | | mPLUG-Owl2-7b | 8.69 | 8.58 | | LLava-1.5-13b | 11.36 | 11.96 | | Qwen-VL-Chat-7b | 11.71 | 12.14 | | Intern-XComposer-7b | 18.65 | 19.07 | | Gemini-Pro | 21.58 | 22.5 | | Qwen-VL-Plus | 26.77 | 26.9 | | GPT-4V | 30.19 | 30.91 | ## 参考文献 **BibTeX 格式：** bibtex @article{he2024cmmu, title={CMMU：中文多模态多类型问题理解与推理基准测试集}, author={何哲琦, 吴欣雅, 周鹏飞, 宣日成, 刘光, 杨曦, 朱千楠, 黄华}, journal={arXiv预印本 arXiv:2401.14011}, year={2024}, }

应用场景：