BAAI/CMMU
收藏Hugging Face2024-01-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/BAAI/CMMU
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- visual-question-answering
language:
- zh
pretty_name: CMMU
size_categories:
- 1K<n<10K
dataset_info:
features:
- name: type
dtype: string
- name: grade_band
dtype: string
- name: difficulty
dtype: string
- name: question_info
dtype: string
- name: split
dtype: string
- name: subject
dtype: string
- name: image
dtype: string
- name: sub_questions
sequence: string
- name: options
sequence: string
- name: answer
sequence: string
- name: solution_info
dtype: string
- name: id
dtype: string
- name: image
dtype: image
configs:
- config_name: default
data_files:
- split: val
path:
- "val/*.parquet"
---
# CMMU
[**📖 Paper**](https://arxiv.org/abs/2401.14011) | [**🤗 Dataset**](https://huggingface.co/datasets) | [**GitHub**](https://github.com/FlagOpen/CMMU)
This repo contains the evaluation code for the paper [**CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning**](https://arxiv.org/abs/2401.14011) .
We release the validation set of CMMU, you can download it from [here](https://huggingface.co/datasets/BAAI/CMMU). The test set will be hosted on the [flageval platform](https://flageval.baai.ac.cn/). Users can test by uploading their models.
## Introduction
CMMU is a novel multi-modal benchmark designed to evaluate domain-specific knowledge across seven foundational subjects: math, biology, physics, chemistry, geography, politics, and history. It comprises 3603 questions, incorporating text and images, drawn from a range of Chinese exams. Spanning primary to high school levels, CMMU offers a thorough evaluation of model capabilities across different educational stages.

## Evaluation Results
We currently evaluated 10 models on CMMU. The results are shown in the following table.
| Model | Val Avg. | Test Avg. |
|----------------------------|----------|-----------|
| InstructBLIP-13b | 0.39 | 0.48 |
| CogVLM-7b | 5.55 | 4.9 |
| ShareGPT4V-7b | 7.95 | 7.63 |
| mPLUG-Owl2-7b | 8.69 | 8.58 |
| LLava-1.5-13b | 11.36 | 11.96 |
| Qwen-VL-Chat-7b | 11.71 | 12.14 |
| Intern-XComposer-7b | 18.65 | 19.07 |
| Gemini-Pro | 21.58 | 22.5 |
| Qwen-VL-Plus | 26.77 | 26.9 |
| GPT-4V | 30.19 | 30.91 |
## Citation
**BibTeX:**
```bibtex
@article{he2024cmmu,
title={CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning},
author={Zheqi He, Xinya Wu, Pengfei Zhou, Richeng Xuan, Guang Liu, Xi Yang, Qiannan Zhu and Hua Huang},
journal={arXiv preprint arXiv:2401.14011},
year={2024},
}
```
提供机构:
BAAI
原始信息汇总
数据集概述
数据集来源
- 该数据集详情页面提供了论文、Hugging Face数据集以及GitHub链接。
数据集类型
- 未明确指出数据集的具体类型。
数据集内容
- 未详细描述数据集的具体内容。
数据集用途
- 未明确指出数据集的具体用途。
数据集链接
- 论文链接
- Hugging Face数据集链接
- GitHub链接



