five

BAAI/CMMU

收藏
Hugging Face2024-01-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/BAAI/CMMU
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - visual-question-answering language: - zh pretty_name: CMMU size_categories: - 1K<n<10K dataset_info: features: - name: type dtype: string - name: grade_band dtype: string - name: difficulty dtype: string - name: question_info dtype: string - name: split dtype: string - name: subject dtype: string - name: image dtype: string - name: sub_questions sequence: string - name: options sequence: string - name: answer sequence: string - name: solution_info dtype: string - name: id dtype: string - name: image dtype: image configs: - config_name: default data_files: - split: val path: - "val/*.parquet" --- # CMMU [**📖 Paper**](https://arxiv.org/abs/2401.14011) | [**🤗 Dataset**](https://huggingface.co/datasets) | [**GitHub**](https://github.com/FlagOpen/CMMU) This repo contains the evaluation code for the paper [**CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning**](https://arxiv.org/abs/2401.14011) . We release the validation set of CMMU, you can download it from [here](https://huggingface.co/datasets/BAAI/CMMU). The test set will be hosted on the [flageval platform](https://flageval.baai.ac.cn/). Users can test by uploading their models. ## Introduction CMMU is a novel multi-modal benchmark designed to evaluate domain-specific knowledge across seven foundational subjects: math, biology, physics, chemistry, geography, politics, and history. It comprises 3603 questions, incorporating text and images, drawn from a range of Chinese exams. Spanning primary to high school levels, CMMU offers a thorough evaluation of model capabilities across different educational stages. ![](assets/example.png) ## Evaluation Results We currently evaluated 10 models on CMMU. The results are shown in the following table. | Model | Val Avg. | Test Avg. | |----------------------------|----------|-----------| | InstructBLIP-13b | 0.39 | 0.48 | | CogVLM-7b | 5.55 | 4.9 | | ShareGPT4V-7b | 7.95 | 7.63 | | mPLUG-Owl2-7b | 8.69 | 8.58 | | LLava-1.5-13b | 11.36 | 11.96 | | Qwen-VL-Chat-7b | 11.71 | 12.14 | | Intern-XComposer-7b | 18.65 | 19.07 | | Gemini-Pro | 21.58 | 22.5 | | Qwen-VL-Plus | 26.77 | 26.9 | | GPT-4V | 30.19 | 30.91 | ## Citation **BibTeX:** ```bibtex @article{he2024cmmu, title={CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning}, author={Zheqi He, Xinya Wu, Pengfei Zhou, Richeng Xuan, Guang Liu, Xi Yang, Qiannan Zhu and Hua Huang}, journal={arXiv preprint arXiv:2401.14011}, year={2024}, } ```
提供机构:
BAAI
原始信息汇总

数据集概述

数据集来源

  • 该数据集详情页面提供了论文、Hugging Face数据集以及GitHub链接。

数据集类型

  • 未明确指出数据集的具体类型。

数据集内容

  • 未详细描述数据集的具体内容。

数据集用途

  • 未明确指出数据集的具体用途。

数据集链接

  • 论文链接
  • Hugging Face数据集链接
  • GitHub链接
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作