five

CUDAOUTOFMEMORY/MMD-Bench

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/CUDAOUTOFMEMORY/MMD-Bench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en tags: - image-degradation - benchmark - multimodal - VLM - robustness pretty_name: "MMD-Bench" size_categories: - 10K<n<100K --- # MMD-Bench: Multimodal Model Degradation Benchmark MMD-Bench is a comprehensive benchmark for evaluating vision-language models (VLMs) under various image degradation conditions. It covers **16 corruption types** across **4 categories** at **3 severity levels**, applied to 6 widely-used VLM benchmarks. > Part of the [CLEAR](https://github.com/haoxiangzhao12138/CLEAR) project. > [[Paper]](https://arxiv.org/abs/2604.04780) | [[Code]](https://github.com/haoxiangzhao12138/CLEAR) | [[Model]](https://huggingface.co/CUDAOUTOFMEMORY/CLEAR) ## Overview Existing VLM benchmarks assume clean, high-quality images. In real-world scenarios, images often suffer from noise, blur, compression artifacts, and other degradations. MMD-Bench systematically evaluates how robust VLMs are to these corruptions. ## Corruption Types | Category | Types | |----------|-------| | **Capture** | Lens Blur, Motion Blur, Lens Flare, Dirty Lens, HSV Saturation | | **Transmission** | JPEG Compression, Block Exchange, Mean Shift, Scan Lines | | **Environment** | Dark Illumination, Atmospheric Turbulence, Gaussian Noise, Color Diffusion | | **Post-processing** | Sharpness Change, Graffiti, Watermark Damage | ## Severity Levels | Level | Intensity | Description | |-------|-----------|-------------| | **Low** | 0.23 | Mild degradation, mostly recognizable | | **Mid** | 0.45 | Moderate degradation | | **High** | 0.9 | Severe degradation, significantly impairs perception | ## Base Benchmarks MMD-Bench applies controlled degradations to 6 standard VLM benchmarks: - **MMBench** (DEV_EN_V11) - **MM-Vet** - **MMVP** - **CV-Bench** (2D) - **MMStar** - **RealWorldQA** For each benchmark, 3 corrupted variants are generated (Low / Mid / High), plus 96 per-degradation variants (16 types x 6 benchmarks at High intensity). ## Data Format Each file is in **TSV format** (compatible with [VLMEvalKit](https://github.com/open-compass/VLMEvalKit)): - Standard benchmark columns (question, answer, options, etc.) - `image` column with base64-encoded JPEG image data ### File Naming Convention ``` {BenchmarkName}_LOW_LEVEL_LOW.tsv # Low severity (mixed corruption types) {BenchmarkName}_LOW_LEVEL_MID.tsv # Mid severity (mixed corruption types) {BenchmarkName}_LOW_LEVEL_HIGH.tsv # High severity (mixed corruption types) {BenchmarkName}_{corruption_type}.tsv # Per-degradation (High severity, single type) ``` ## Benchmark Results (High Severity) | Method | MMBench | MM-Vet | MMVP | CV-Bench | MMStar | RealWorldQA | AVG | |--------|---------|--------|------|----------|--------|-------------|-----| | GPT-4o-mini | 67.02 | 50.91 | 64.00 | 59.87 | 45.93 | 58.95 | 57.78 | | Gemini-2.5-Flash | 79.33 | 66.55 | 72.33 | 76.01 | 62.00 | 69.15 | 70.90 | | Bagel | 67.88 | 45.09 | 65.66 | 64.81 | 55.53 | 58.43 | 59.57 | | **CLEAR-RL** | **72.52** | **51.97** | **71.33** | **72.25** | **60.67** | **61.05** | **64.97** | ## Citation ```bibtex @misc{hao2026clearunlockinggenerativepotential, title={CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models}, author={Xiangzhao Hao and Zefeng Zhang and Zhenyu Zhang and Linhao Yu and Yao Chen and Yiqian Zhang and Haiyun Guo and Shuohuan Wang and Yu Sun}, year={2026}, eprint={2604.04780}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2604.04780}, } ```

license: apache-2.0 许可证:Apache-2.0 language: 语言:英语 tags: 标签: - 图像退化 - 基准测试 - 多模态 - 视觉语言模型(Vision-Language Model,VLM) - 鲁棒性 pretty_name: "MMD-Bench" 数据集名称:"MMD-Bench" size_categories: 样本规模:10000<样本数<100000 # MMD-Bench:多模态模型退化基准测试集 MMD-Bench是一款用于在各类图像退化条件下评估视觉语言模型(VLM)的综合基准测试集。该测试集覆盖4大类共16种退化类型,设置3种严重程度,并应用于6个广泛使用的VLM基准测试集。 > 本项目为[CLEAR](https://github.com/haoxiangzhao12138/CLEAR)项目的组成部分。 > [[论文]](https://arxiv.org/abs/2604.04780) | [[代码]](https://github.com/haoxiangzhao12138/CLEAR) | [[模型]](https://huggingface.co/CUDAOUTOFMEMORY/CLEAR) ## 概述 现有VLM基准测试集均假设输入图像为清晰高质量的图像,但在真实场景中,图像往往会遭遇噪声、模糊、压缩伪影等各类退化问题。MMD-Bench可系统性评估VLM对这些图像退化的鲁棒性。 ## 退化类型 | 类别 | 类型 | |------------|----------------------------------------------------------| | **采集类** | 镜头模糊、运动模糊、镜头眩光、镜头脏污、HSV饱和度调整 | | **传输类** | JPEG压缩、块交换、均值偏移、扫描线干扰 | | **环境类** | 低光照、大气湍流、高斯噪声、色彩扩散 | | **后处理类**| 锐度变化、涂鸦、水印损坏 | ## 严重程度等级 | 等级 | 强度值 | 描述 | |--------|--------|------------------------------------------| | **低** | 0.23 | 轻度退化,图像主体大多仍可识别 | | **中** | 0.45 | 中度退化 | | **高** | 0.9 | 重度退化,会显著影响感知效果 | ## 基础基准测试集 MMD-Bench对6个标准VLM基准测试集施加可控的图像退化操作,分别为: - **MMBench**(DEV_EN_V11) - **MM-Vet** - **MMVP** - **CV-Bench**(2D) - **MMStar** - **RealWorldQA** 针对每个基准测试集,会生成3种不同严重程度的退化变体(低/中/高等级),此外还包含96种单退化类型变体(16种退化类型 × 6个基准测试集,均为高严重程度)。 ## 数据格式 每个文件均采用TSV格式(兼容[VLMEvalKit](https://github.com/open-compass/VLMEvalKit)): - 包含标准基准测试集通用列(问题、答案、选项等) - 存储base64编码JPEG图像数据的`image`列 ### 文件命名规范 {BenchmarkName}_LOW_LEVEL_LOW.tsv # 低严重程度(混合多种退化类型) {BenchmarkName}_LOW_LEVEL_MID.tsv # 中严重程度(混合多种退化类型) {BenchmarkName}_LOW_LEVEL_HIGH.tsv # 高严重程度(混合多种退化类型) {BenchmarkName}_{corruption_type}.tsv # 单退化类型(高严重程度,仅含单一退化类型) ## 高严重程度下的基准测试结果 | 方法 | MMBench | MM-Vet | MMVP | CV-Bench | MMStar | RealWorldQA | 平均 | |----------------|---------|--------|-------|----------|--------|-------------|-------| | GPT-4o-mini | 67.02 | 50.91 | 64.00 | 59.87 | 45.93 | 58.95 | 57.78 | | Gemini-2.5-Flash | 79.33 | 66.55 | 72.33 | 76.01 | 62.00 | 69.15 | 70.90 | | Bagel | 67.88 | 45.09 | 65.66 | 64.81 | 55.53 | 58.43 | 59.57 | | **CLEAR-RL** | **72.52** | **51.97** | **71.33** | **72.25** | **60.67** | **61.05** | **64.97** | ## 引用格式 bibtex @misc{hao2026clearunlockinggenerativepotential, title={CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models}, author={Xiangzhao Hao and Zefeng Zhang and Zhenyu Zhang and Linhao Yu and Yao Chen and Yiqian Zhang and Haiyun Guo and Shuohuan Wang and Yu Sun}, year={2026}, eprint={2604.04780}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2604.04780}, }
提供机构:
CUDAOUTOFMEMORY
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作