MM-CD/CDBench

Name: MM-CD/CDBench
Creator: MM-CD
Published: 2026-04-02 08:54:28
License: 暂无描述

Hugging Face2026-04-02 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/MM-CD/CDBench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-sa-4.0 task_categories: - question-answering size_categories: - 10K<n<100K dataset_info: # - config_name: viewer # features: # - name: question # dtype: string # - name: options # dtype: string # - name: answer # dtype: string # - name: target_image # dtype: image # - name: reference_image # dtype: image # - name: mask # dtype: image configs: - config_name: viewer data_files: "questions.csv" --- # CDBench: A Comprehensive Multimodal Dataset and Evaluation Benchmark for General Change Detection [![Demo](https://img.shields.io/badge/🤗-Demo-green)](http://demo.mm-cd.org:8880) _Demo response may be slow due to limited resources. We appreciate your understanding._ ## Introduction General change detection aims to identify and interpret meaningful differences observed in scenes or objects across different states, playing a critical role in domains such as remote sensing and industrial inspection. While Multimodal Large Language Models (MLLMs) show promise, their capabilities in structured general change detection remain underexplored. This project introduces **CDBench**, the first comprehensive benchmark for evaluating MLLMs' capabilities in multimodal general change detection across diverse domains. Our benchmark unifies diverse datasets and defines seven structured tasks: two image analysis tasks (Image Content Classification, Image Content Description) and five change analysis tasks (Change Discrimination, Change Localization, Semantic Change Classification/Detection, Change Description, and Change Reasoning). We also propose the **ChangeAgent** framework, which enhances MLLM cores through retrieval-augmented generation and expert visual guidance, achieving a significantly higher average accuracy of 77.10% on CDBench, compared to 70-71% for leading baseline MLLMs. <div align="center"> <img src="assets/overview.png" width="80%"> <br> <em>Figure 1: Overview of CDBench benchmark tasks and ChangeAgent architecture</em> </div> ## What's New - **First Multimodal Change Detection Benchmark**: Introduced the first comprehensive benchmark specifically designed for evaluating MLLMs on general change detection tasks - **Seven-Task Evaluation Framework**: Designed seven structured tasks covering image analysis and change analysis, forming a large-scale evaluation set with 70,000+ question-answer pairs - **Hybrid Cross-Generation Strategy**: Employed LLM-driven content generation, cross-model optimization strategies, and dual-expert human validation to ensure evaluation quality and fairness - **ChangeAgent Innovative Architecture**: Proposed a hybrid framework combining expert visual guidance and retrieval-augmented generation, significantly improving change detection performance ## Dataset ### Dataset Overview CDBench integrates diverse datasets from multiple domains, including remote sensing, industrial inspection, and commodity product change/anomaly detection, totaling over 15,000 image pairs. The dataset covers: - **Remote Sensing Data**: LEVIR-CD, SYSU-CD, CDD - **Industrial Inspection**: MVTec-AD, MVTec-LOCO, Visa - **Commodity Inspection**: GoodsAD To address the issue of some original data lacking paired reference samples, we employ nearest neighbor search to retrieve the most relevant reference samples from normal categories. For remote sensing data, we utilize CLIP models to perform scene classification based on approximately 50 predefined scene categories, providing richer contextual information for subsequent analysis. ### Dataset Statistics | Dataset Category | Image Pairs | Tasks | QA Pairs | |------------------|-------------|-------|----------| | Remote Sensing | 7,000+ | 7 | 30,000+ | | Industrial Inspection | 5,000+ | 7 | 30,000+ | | Commodity Inspection | 2,000+ | 7 | 10,000+ | | **Total** | **14,000+** | **7** | **70,000+** | ### Data Examples <div align="center"> <img src="assets/dataset_examples.png" width="80%"> <br> <em>Figure 2: Examples of seven tasks in the CDBench dataset</em> </div> ## Seven Core Tasks 1. **Q1: Image Content Classification** - Identify the primary scene type or dominant content category of a given image 2. **Q2: Image Content Description** - Generate comprehensive textual descriptions of images, detailing fine-grained features and spatial layouts 3. **Q3: Change Discrimination** - Determine whether significant changes have occurred between two registered images 4. **Q4: Change Localization** - Identify and describe the specific pixel or regional locations where changes occur 5. **Q5: Semantic Change Classification** - Classify the nature or semantic category of detected changes 6. **Q6: Change Description** - Provide concise textual summaries of identified change events 7. **Q7: Change Reasoning** - Infer plausible causes or root reasons for detected changes ## Demo We provide an online demonstration system accessible through the following links: - 🌐 [Official Project Website](http://mm-cd.org) - 📊 [Interactive Evaluation Platform](http://demo.mm-cd.org:8880) Demo response may be slow due to limited resources. We appreciate your understanding. <div align="center"> <img src="assets/demo_screenshot.png" width="80%"> <br> <em>Figure 3: Screenshot of CDBench online evaluation platform</em> </div> ## Results ### Performance Comparison | Model | Image Classification (↑) | Image Description (↑) | Change Discrimination (↑) | Change Localization (↑) | Change Classification (↑) | Change Description (↑) | Change Reasoning (↑) | Average (↑) | |-------|-------------------------|---------------------|-------------------------|------------------------|-------------------------|----------------------|-------------------|-------------| | Qwen-Max-Latest | 57.03 | 79.97 | 55.88 | 72.33 | 61.78 | 64.35 | 65.00 | 65.19 | | Qwen-plus-latest | 55.12 | 79.04 | 55.87 | 72.18 | 62.77 | 63.70 | 65.93 | 64.94 | | Claude-3-5-sonnet | 70.17 | 82.27 | 67.47 | 71.74 | 64.96 | 58.48 | 64.15 | 68.46 | | Gemini-1.5-pro | 72.54 | 83.07 | 66.03 | 73.96 | 66.55 | 67.59 | 68.60 | 71.19 | | GPT-4o | 91.37 | 85.00 | 69.72 | 69.70 | 56.83 | 61.71 | 60.46 | 70.68 | | **ChangeAgent (Ours)** | **96.87** | **76.78** | **78.81** | **76.79** | **77.67** | **70.82** | **69.99** | **77.10** | ### ChangeAgent Architecture Advantages ChangeAgent achieves performance improvements through three core modules: 1. **Expert Visual Change Localization**: Utilizes CLIP visual encoder and change decoder for precise change localization 2. **Knowledge Retrieval and Enhancement**: Injects domain-specific prior knowledge through RAG module 3. **Integrated Reasoning and Task Execution**: Combines multimodal context for complex semantic reasoning ### Visualization Results <div align="center"> <img src="assets/results_comparison.png" width="90%"> <br> <em>Figure 4: Performance comparison between ChangeAgent and existing MLLMs on complex change detection tasks</em> </div> ## Acknowledgements We thank the following projects and institutions for their support: - School of Computer Science and Artificial Intelligence, Fudan University - School of Information Technology, Shanghai Ocean University - R&D Department, Shanghai Vision Medical Technology Co., Ltd. - ACM Multimedia 2025 Conference - All experts who participated in data annotation and validation - Open source community for providing foundational models like CLIP and LLaMA

提供机构：

MM-CD

5,000+

优质数据集

54 个

任务类型

进入经典数据集