CUDAOUTOFMEMORY/MMD-Bench
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/CUDAOUTOFMEMORY/MMD-Bench
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
tags:
- image-degradation
- benchmark
- multimodal
- VLM
- robustness
pretty_name: "MMD-Bench"
size_categories:
- 10K<n<100K
---
# MMD-Bench: Multimodal Model Degradation Benchmark
MMD-Bench is a comprehensive benchmark for evaluating vision-language models (VLMs) under various image degradation conditions. It covers **16 corruption types** across **4 categories** at **3 severity levels**, applied to 6 widely-used VLM benchmarks.
> Part of the [CLEAR](https://github.com/haoxiangzhao12138/CLEAR) project.
> [[Paper]](https://arxiv.org/abs/2604.04780) | [[Code]](https://github.com/haoxiangzhao12138/CLEAR) | [[Model]](https://huggingface.co/CUDAOUTOFMEMORY/CLEAR)
## Overview
Existing VLM benchmarks assume clean, high-quality images. In real-world scenarios, images often suffer from noise, blur, compression artifacts, and other degradations. MMD-Bench systematically evaluates how robust VLMs are to these corruptions.
## Corruption Types
| Category | Types |
|----------|-------|
| **Capture** | Lens Blur, Motion Blur, Lens Flare, Dirty Lens, HSV Saturation |
| **Transmission** | JPEG Compression, Block Exchange, Mean Shift, Scan Lines |
| **Environment** | Dark Illumination, Atmospheric Turbulence, Gaussian Noise, Color Diffusion |
| **Post-processing** | Sharpness Change, Graffiti, Watermark Damage |
## Severity Levels
| Level | Intensity | Description |
|-------|-----------|-------------|
| **Low** | 0.23 | Mild degradation, mostly recognizable |
| **Mid** | 0.45 | Moderate degradation |
| **High** | 0.9 | Severe degradation, significantly impairs perception |
## Base Benchmarks
MMD-Bench applies controlled degradations to 6 standard VLM benchmarks:
- **MMBench** (DEV_EN_V11)
- **MM-Vet**
- **MMVP**
- **CV-Bench** (2D)
- **MMStar**
- **RealWorldQA**
For each benchmark, 3 corrupted variants are generated (Low / Mid / High), plus 96 per-degradation variants (16 types x 6 benchmarks at High intensity).
## Data Format
Each file is in **TSV format** (compatible with [VLMEvalKit](https://github.com/open-compass/VLMEvalKit)):
- Standard benchmark columns (question, answer, options, etc.)
- `image` column with base64-encoded JPEG image data
### File Naming Convention
```
{BenchmarkName}_LOW_LEVEL_LOW.tsv # Low severity (mixed corruption types)
{BenchmarkName}_LOW_LEVEL_MID.tsv # Mid severity (mixed corruption types)
{BenchmarkName}_LOW_LEVEL_HIGH.tsv # High severity (mixed corruption types)
{BenchmarkName}_{corruption_type}.tsv # Per-degradation (High severity, single type)
```
## Benchmark Results (High Severity)
| Method | MMBench | MM-Vet | MMVP | CV-Bench | MMStar | RealWorldQA | AVG |
|--------|---------|--------|------|----------|--------|-------------|-----|
| GPT-4o-mini | 67.02 | 50.91 | 64.00 | 59.87 | 45.93 | 58.95 | 57.78 |
| Gemini-2.5-Flash | 79.33 | 66.55 | 72.33 | 76.01 | 62.00 | 69.15 | 70.90 |
| Bagel | 67.88 | 45.09 | 65.66 | 64.81 | 55.53 | 58.43 | 59.57 |
| **CLEAR-RL** | **72.52** | **51.97** | **71.33** | **72.25** | **60.67** | **61.05** | **64.97** |
## Citation
```bibtex
@misc{hao2026clearunlockinggenerativepotential,
title={CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models},
author={Xiangzhao Hao and Zefeng Zhang and Zhenyu Zhang and Linhao Yu and Yao Chen and Yiqian Zhang and Haiyun Guo and Shuohuan Wang and Yu Sun},
year={2026},
eprint={2604.04780},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.04780},
}
```
license: apache-2.0
许可证:Apache-2.0
language:
语言:英语
tags:
标签:
- 图像退化
- 基准测试
- 多模态
- 视觉语言模型(Vision-Language Model,VLM)
- 鲁棒性
pretty_name: "MMD-Bench"
数据集名称:"MMD-Bench"
size_categories:
样本规模:10000<样本数<100000
# MMD-Bench:多模态模型退化基准测试集
MMD-Bench是一款用于在各类图像退化条件下评估视觉语言模型(VLM)的综合基准测试集。该测试集覆盖4大类共16种退化类型,设置3种严重程度,并应用于6个广泛使用的VLM基准测试集。
> 本项目为[CLEAR](https://github.com/haoxiangzhao12138/CLEAR)项目的组成部分。
> [[论文]](https://arxiv.org/abs/2604.04780) | [[代码]](https://github.com/haoxiangzhao12138/CLEAR) | [[模型]](https://huggingface.co/CUDAOUTOFMEMORY/CLEAR)
## 概述
现有VLM基准测试集均假设输入图像为清晰高质量的图像,但在真实场景中,图像往往会遭遇噪声、模糊、压缩伪影等各类退化问题。MMD-Bench可系统性评估VLM对这些图像退化的鲁棒性。
## 退化类型
| 类别 | 类型 |
|------------|----------------------------------------------------------|
| **采集类** | 镜头模糊、运动模糊、镜头眩光、镜头脏污、HSV饱和度调整 |
| **传输类** | JPEG压缩、块交换、均值偏移、扫描线干扰 |
| **环境类** | 低光照、大气湍流、高斯噪声、色彩扩散 |
| **后处理类**| 锐度变化、涂鸦、水印损坏 |
## 严重程度等级
| 等级 | 强度值 | 描述 |
|--------|--------|------------------------------------------|
| **低** | 0.23 | 轻度退化,图像主体大多仍可识别 |
| **中** | 0.45 | 中度退化 |
| **高** | 0.9 | 重度退化,会显著影响感知效果 |
## 基础基准测试集
MMD-Bench对6个标准VLM基准测试集施加可控的图像退化操作,分别为:
- **MMBench**(DEV_EN_V11)
- **MM-Vet**
- **MMVP**
- **CV-Bench**(2D)
- **MMStar**
- **RealWorldQA**
针对每个基准测试集,会生成3种不同严重程度的退化变体(低/中/高等级),此外还包含96种单退化类型变体(16种退化类型 × 6个基准测试集,均为高严重程度)。
## 数据格式
每个文件均采用TSV格式(兼容[VLMEvalKit](https://github.com/open-compass/VLMEvalKit)):
- 包含标准基准测试集通用列(问题、答案、选项等)
- 存储base64编码JPEG图像数据的`image`列
### 文件命名规范
{BenchmarkName}_LOW_LEVEL_LOW.tsv # 低严重程度(混合多种退化类型)
{BenchmarkName}_LOW_LEVEL_MID.tsv # 中严重程度(混合多种退化类型)
{BenchmarkName}_LOW_LEVEL_HIGH.tsv # 高严重程度(混合多种退化类型)
{BenchmarkName}_{corruption_type}.tsv # 单退化类型(高严重程度,仅含单一退化类型)
## 高严重程度下的基准测试结果
| 方法 | MMBench | MM-Vet | MMVP | CV-Bench | MMStar | RealWorldQA | 平均 |
|----------------|---------|--------|-------|----------|--------|-------------|-------|
| GPT-4o-mini | 67.02 | 50.91 | 64.00 | 59.87 | 45.93 | 58.95 | 57.78 |
| Gemini-2.5-Flash | 79.33 | 66.55 | 72.33 | 76.01 | 62.00 | 69.15 | 70.90 |
| Bagel | 67.88 | 45.09 | 65.66 | 64.81 | 55.53 | 58.43 | 59.57 |
| **CLEAR-RL** | **72.52** | **51.97** | **71.33** | **72.25** | **60.67** | **61.05** | **64.97** |
## 引用格式
bibtex
@misc{hao2026clearunlockinggenerativepotential,
title={CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models},
author={Xiangzhao Hao and Zefeng Zhang and Zhenyu Zhang and Linhao Yu and Yao Chen and Yiqian Zhang and Haiyun Guo and Shuohuan Wang and Yu Sun},
year={2026},
eprint={2604.04780},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.04780},
}
提供机构:
CUDAOUTOFMEMORY



