CMM
收藏魔搭社区2026-01-06 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/DAMO-NLP-SG/CMM
下载链接
链接失效反馈官方服务:
资源简介:
# The Curse of Multi-Modalities (CMM) Dataset Card
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/609115c79a8bcaa437b234a9/_fSnc78JKOKmUzD9cLWAu.png" width="75%" height="75%">
</p>
## Dataset details
**Dataset type:**
CMM is a curated benchmark designed to evaluate hallucination vulnerabilities in Large Multi-Modal Models (LMMs). It is constructed to rigorously test LMMs’ capabilities across visual, audio, and language modalities, focusing on hallucinations arising from inter-modality spurious correlations and uni-modal over-reliance.
**Dataset detail:**
CMM introduces 2,400 probing questions across 1,200 carefully selected video/audio/video-audio samples from WebVid, AudioCaps, Auto-ACD, and YouTube. Each sample is paired with two questions targeting the existence of both real existent and non-existent objects or events, ensuring a comprehensive assessment of perception accuracy and hallucination resistance.
**Data instructions:**
Please download the raw videos in ./reorg_raw_files.zip and the unzipped structure should be:
```bash
reorg_raw_files
├── inter-modality_spurious_correlation
| ├── audio-language/
| ├── visual-language/
| ├── audio-language/
├── over-reliance_unimodal_priors
| ├── overrely_audio_ignore_visual/
| ├── overrely_visual_ignore_audio/
| ├── overrely_language_ignore_visual/
```
**Evaluation Instruction:**
For detailed evaluation instructions, please refer to our GitHub repo: https://github.com/DAMO-NLP-SG/CMM/.
**Dataset date:**
CMM was released in Oct 2024.
**Paper or resources for more information:**
https://github.com/DAMO-NLP-SG/CMM/
**Where to send questions or comments about the model:**
https://github.com/DAMO-NLP-SG/CMM/issues
## Citation
If you find CMM useful for your research and applications, please cite using this BibTeX:
```bibtex
@article{leng2024curse,
title={The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio},
author={Sicong Leng and Yun Xing and Zesen Cheng and Yang Zhou and Hang Zhang and Xin Li and Deli Zhao and Shijian Lu and Chunyan Miao and Lidong Bing},
journal={arXiv},
year={2024},
url={https://arxiv.org/abs/2410.12787}
}
```
## Intended use
**Primary intended uses:**
The primary use of CMM is research on LMMs.
**Primary intended users:**
The primary intended users of the dataset are researchers and hobbyists in computer vision, natural language processing, audio processing, multi-modal learning, machine learning, and artificial intelligence.
# 多模态诅咒(Curse of Multi-Modalities, CMM)数据集卡片
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/609115c79a8bcaa437b234a9/_fSnc78JKOKmUzD9cLWAu.png" width="75%" height="75%">
</p>
## 数据集详情
**数据集类型:**
CMM是一款精心甄选的基准测试集,用于评估大型多模态模型(Large Multi-Modal Models, LMMs)的幻觉漏洞。其构建目标为严格测试多模态模型在视觉、音频与语言模态下的综合能力,重点聚焦跨模态虚假关联与单模态过度依赖所引发的幻觉问题。
**数据集详情:**
CMM从WebVid、AudioCaps、Auto-ACD以及YouTube平台中精选了1200个样本,涵盖视频、音频、音视频三类模态,配套生成2400个探查性问题。每个样本均搭配两个问题,分别针对真实存在与虚构的物体或事件进行设问,以全面评估模型的感知准确性与抗幻觉鲁棒性。
**数据获取说明:**
请下载./reorg_raw_files.zip中的原始视频文件,解压后的目录结构应如下:
bash
reorg_raw_files
├── 跨模态虚假关联(inter-modality_spurious_correlation)
| ├── audio-language/
| ├── visual-language/
| ├── audio-language/
├── 单模态先验过度依赖(over-reliance_unimodal_priors)
| ├── overrely_audio_ignore_visual/
| ├── overrely_visual_ignore_audio/
| ├── overrely_language_ignore_visual/
**评估说明:** 详细的评估流程请参考我们的GitHub仓库:https://github.com/DAMO-NLP-SG/CMM/
**数据集发布时间:** CMM于2024年10月正式发布。
**更多信息来源:** https://github.com/DAMO-NLP-SG/CMM/
**数据集反馈渠道:** 请在https://github.com/DAMO-NLP-SG/CMM/issues 提交关于该数据集的疑问或建议。
## 引用格式
如果您的研究或应用中使用了CMM数据集,请使用以下BibTeX格式进行引用:
bibtex
@article{leng2024curse,
title={The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio},
author={Sicong Leng and Yun Xing and Zesen Cheng and Yang Zhou and Hang Zhang and Xin Li and Deli Zhao and Shijian Lu and Chunyan Miao and Lidong Bing},
journal={arXiv},
year={2024},
url={https://arxiv.org/abs/2410.12787}
}
## 预期用途
**主要用途:** CMM的核心用途为大型多模态模型相关研究。
**目标用户:** 本数据集的目标用户为计算机视觉、自然语言处理、音频处理、多模态学习、机器学习以及人工智能领域的研究人员与爱好者。
提供机构:
maas
创建时间:
2025-07-08



