MME-Unify|多模态理解数据集|模型评估数据集
收藏MME-Unify数据集概述
数据集简介
- 名称:MME-Unify
- 类型:多模态理解与生成模型评估基准
- 目的:系统评估统一多模态大语言模型(U-MLLMs)的能力
- 特点:
- 包含标准化传统任务评估和统一任务评估
- 涵盖12个数据集、10个任务、30个子任务
- 引入5个新颖的多模态推理任务
数据集内容
-
任务类型:
- 图像编辑
- 常识问答与图像生成
- 几何推理
- 条件图像到视频生成
- 细粒度图像重建
- 数学推理
- 多图像与文本交错
- 单图像感知与理解
- 找不同
- 文本-图像编辑
- 文本-图像生成
- 文本到视频生成
- 视频感知与理解
- 视觉思维链
-
数据结构:
MME-Unify ├── CommonSense_Questions ├── Conditional_Image_to_Video_Generation ├── Fine-Grained_Image_Reconstruction ├── Math_Reasoning ├── Multiple_Images_and_Text_Interlaced ├── Single_Image_Perception_and_Understanding ├── Spot_Diff ├── Text-Image_Editing ├── Text-Image_Generation ├── Text-to-Video_Generation ├── Video_Perception_and_Understanding └── Visual_CoT
评估流程
- 提示模板:
MME-Unify/Prompt.txt
- 评估脚本:
MME-Unify/evaluate
- 响应格式:JSON模板(
output_test_template.json
)
许可证
- 用途限制:仅限学术研究
- 商业使用:禁止
- 版权声明:图像版权归原作者所有
- 分发限制:未经事先批准不得分发、发布、复制、传播或修改
引用信息
bibtex @article{xie2025mme, title={MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models}, author={Xie, Wulin and Zhang, Yi-Fan and Fu, Chaoyou and Shi, Yang and Nie, Bingyan and Chen, Hongkai and Zhang, Zhang and Wang, Liang and Tan, Tieniu}, journal={arXiv preprint arXiv:2504.03641}, year={2025} }
相关资源
- 数据集下载:https://huggingface.co/datasets/wulin222/MME-Unify
- 论文链接:https://arxiv.org/abs/2504.03641
- 项目主页:https://mme-unify.github.io/

中国裁判文书网
中国裁判文书网是中国最高人民法院设立的官方网站,旨在公开各级法院的裁判文书。该数据集包含了大量的法律文书,如判决书、裁定书、调解书等,涵盖了民事、刑事、行政、知识产权等多个法律领域。
wenshu.court.gov.cn 收录
Infrared Thermal Image Dataset of High Voltage Electrical Power Equipment under Different Operating Conditions
Recognizing high voltage power equipment in electrical substations is the fundamental platform for effective condition monitoring of electrical power system. It enables proper identification and analysis of anomalies within the equipment, especially when in operation. The result such investigation can be applied for effective real-time measurement, control and protection schemes in the network. The use of visual images for this purpose would be limited during poor lighting conditions. However, Infrared (IR) images of the equipment are invariant to poor illumination condition. Hence, we have acquired the thermographic images of the high voltage power equipment using the portable professional FLIR C5 Infrared camera at different times of the day and load conditions. The dataset contains 5 categories of high voltages equipment common to most air-insulated electrical power substation at 132kV level, namely: circuit breakers, power transformers, surge arresters, disconnectors, and wave traps. The number of IR images for each class of equipment are: circuit breakers 203, power transformers 178, surge arresters 181, disconnectors 180, and wave traps 153. The IR images are 640 x 480 pixel RGB images captured using the rainbow color palette and properly segmented in labeled folders. The color bar in each IR image identifies the thermal range used during its acquisition. The dataset can be used for implementing novel research in computer vision based deep learning models, especially in object recognition, identification, fault classification or detection algorithms. The thermal profile of the equipment in the dataset could be applied for detection of hotspots and other related anomalies.
DataCite Commons 收录
HazyDet
HazyDet是由解放军工程大学等机构创建的一个大规模数据集,专门用于雾霾场景下的无人机视角物体检测。该数据集包含383,000个真实世界实例,收集自自然雾霾环境和正常场景中人工添加的雾霾效果,以模拟恶劣天气条件。数据集的创建过程结合了深度估计和大气散射模型,确保了数据的真实性和多样性。HazyDet主要应用于无人机在恶劣天气条件下的物体检测,旨在提高无人机在复杂环境中的感知能力。
arXiv 收录
jpft/danbooru2023
Danbooru2023是一个大规模的动漫图像数据集,包含超过500万张由爱好者社区贡献并详细标注的图像。图像标签涵盖角色、场景、版权、艺术家等方面,平均每张图像有30个标签。该数据集可用于训练图像分类、多标签标注、角色检测、生成模型等多种计算机视觉任务。数据集基于danbooru2021构建,扩展至包含ID #6,857,737的图像,增加了超过180万张新图像,总大小约为8TB。图像以原始格式提供,分为1000个子目录,使用图像ID的模1000进行分桶,以避免文件系统性能问题。
hugging_face 收录
CBIS-DDSM
该数据集用于训练乳腺癌分类器或分割模型,包含3103张乳腺X光片,其中465张有多个异常。数据集分为训练集和测试集,还包括3568张裁剪的乳腺X光片和对应的掩码。
github 收录