Uni-MMMU-Eval

Name: Uni-MMMU-Eval
Creator: maas
Published: 2025-11-27 16:53:23
License: 暂无描述

魔搭社区2025-11-27 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/Vchitect/Uni-MMMU-Eval

下载链接

链接失效反馈

官方服务：

资源简介：

# Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark This is the official dataset of **Uni-MMMU**, a novel benchmark with bidirectionally coupled tasks designed to evaluate how unified models synergistically use generation to aid understanding and understanding to guide generation. - **Paper:** [Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark](https://huggingface.co/papers/2510.13759) - **Project Page:** [https://vchitect.github.io/Uni-MMMU-Project/](https://vchitect.github.io/Uni-MMMU-Project/) - **Code:** [https://github.com/Vchitect/Uni-MMMU](https://github.com/Vchitect/Uni-MMMU) ## Overview Unified multimodal models aim to jointly enable visual understanding and generation, yet current benchmarks rarely examine their true integration. Existing evaluations either treat the two abilities in isolation or overlook tasks that inherently couple them. To address this gap, we present **Uni-MMMU**, a comprehensive and discipline-aware benchmark that systematically unfolds the bidirectional synergy between generation and understanding across eight reasoning-centric domains, including science, coding, mathematics, and puzzles. Each task is bidirectionally coupled, demanding models to (i) leverage conceptual understanding to guide precise visual synthesis, or (ii) utilize generation as a cognitive scaffold for analytical reasoning. Uni-MMMU incorporates verifiable intermediate reasoning steps, unique ground truths, and a reproducible scoring protocol for both textual and visual outputs. Through extensive evaluation of state-of-the-art unified, generation-only, and understanding-only models, we reveal substantial performance disparities and cross-modal dependencies, offering new insights into **when and how** these abilities reinforce one another, and establishing a reliable foundation for advancing unified models. ## Sample Usage ### Installation 1. Clone the repository. ```bash git clone https://github.com/Vchitect/Uni-MMMU.git cd Uni-MMMU ``` 2. Install the environment. ```bash conda update -n base -c defaults conda conda create -n ummmu python==3.10 -y conda activate ummmu pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118 pip install -r requirements.txt ``` 3. Download the dataset. ```bash git clone https://huggingface.co/datasets/Vchitect/Uni-MMMU-Eval cd Uni-MMMU-Eval tar -xvf data.tar -C /path/to/Uni-MMMU ``` ### Sampling - Please refer to `./sample_code_example` for details. - All sampled data will be in `./outputs/model_name`. ### Evaluation #### Command ``` python eval_ummmu.py --model_name model_to_be_eval ``` - Note: This evaluation requires Qwen2.5-VL-72B and Qwen3-32B as evaluators. We recommend running this on a system with at least A100 80GB GPUs to ensure sufficient memory and performance. ## Citation If you find Uni-MMMU useful for your research, please cite the following paper: ```bibtex @misc{zou2025unimmmumassivemultidisciplinemultimodal, title={Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark}, author={Kai Zou and Ziqi Huang and Yuhao Dong and Shulin Tian and Dian Zheng and Hongbo Liu and Jingwen He and Bin Liu and Yu Qiao and Ziwei Liu}, year={2025}, eprint={2510.13759}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2510.13759}, } ```

# Uni-MMMU：大规模多学科多模态统一基准测试集本数据集为**Uni-MMMU**的官方配套数据集。Uni-MMMU是一款全新的基准测试集，其任务均为双向耦合型任务，旨在评估统一模态模型如何协同利用生成能力辅助理解，以及借助理解能力引导生成。 - **论文**：[Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark](https://huggingface.co/papers/2510.13759) - **项目主页**：[https://vchitect.github.io/Uni-MMMU-Project/](https://vchitect.github.io/Uni-MMMU-Project/) - **代码仓库**：[https://github.com/Vchitect/Uni-MMMU](https://github.com/Vchitect/Uni-MMMU) ## 概述统一多模态模型旨在同时实现视觉理解与生成能力，但现有基准测试集极少能够评估二者的真正协同整合效果。现有评估要么将两种能力割裂评估，要么忽略了二者存在内在耦合关系的任务。为填补这一研究空白，我们推出**Uni-MMMU**——一款具备学科感知能力的综合性基准测试集，它系统地在科学、编码、数学、谜题等8个以推理为核心的领域中，展现生成与理解之间的双向协同机制。所有任务均为双向耦合型，要求模型要么（i）依托概念理解能力引导精准的视觉合成，要么（ii）将生成过程作为认知支架辅助分析推理。Uni-MMMU包含可验证的中间推理步骤、专属的标准答案，以及针对文本与视觉输出的可复现评分方案。通过对当前主流的统一模态模型、仅生成模型与仅理解模型开展大规模评估，我们揭示了显著的性能差异与跨模态依赖关系，为理解这些能力如何相互强化提供了全新视角，并为统一模态模型的后续发展奠定了可靠的研究基础。 ## 使用示例 ### 安装步骤 1. 克隆代码仓库。 bash git clone https://github.com/Vchitect/Uni-MMMU.git cd Uni-MMMU 2. 配置运行环境 bash conda update -n base -c defaults conda conda create -n ummmu python==3.10 -y conda activate ummmu pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118 pip install -r requirements.txt 3. 下载数据集 bash git clone https://huggingface.co/datasets/Vchitect/Uni-MMMU-Eval cd Uni-MMMU-Eval tar -xvf data.tar -C /path/to/Uni-MMMU ### 样本生成 - 详细使用方法请参考`./sample_code_example`目录。 - 所有生成的样本数据均将保存至`./outputs/model_name`目录。 ### 模型评估 #### 评估命令 python eval_ummmu.py --model_name model_to_be_eval - 注意：本次评估需使用Qwen2.5-VL-72B与Qwen3-32B作为评估模型。我们建议在搭载至少一块A100 80GB显存GPU的设备上运行本评估流程，以保障足够的内存与运行性能。 ## 引用声明若您的研究中使用了Uni-MMMU，请引用以下论文： bibtex @misc{zou2025unimmmumassivemultidisciplinemultimodal, title={Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark}, author={Kai Zou and Ziqi Huang and Yuhao Dong and Shulin Tian and Dian Zheng and Hongbo Liu and Jingwen He and Bin Liu and Yu Qiao and Ziwei Liu}, year={2025}, eprint={2510.13759}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2510.13759}, }

提供机构：

maas

创建时间：

2025-10-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集