CMM

Name: CMM
Creator: maas
Published: 2026-01-06 16:38:15
License: 暂无描述

魔搭社区2026-01-06 更新2025-01-25 收录

下载链接：

https://modelscope.cn/datasets/DAMO-NLP-SG/CMM

下载链接

链接失效反馈

官方服务：

资源简介：

# The Curse of Multi-Modalities (CMM) Dataset Card <p align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/609115c79a8bcaa437b234a9/_fSnc78JKOKmUzD9cLWAu.png" width="75%" height="75%"> </p> ## Dataset details **Dataset type:** CMM is a curated benchmark designed to evaluate hallucination vulnerabilities in Large Multi-Modal Models (LMMs). It is constructed to rigorously test LMMs’ capabilities across visual, audio, and language modalities, focusing on hallucinations arising from inter-modality spurious correlations and uni-modal over-reliance. **Dataset detail:** CMM introduces 2,400 probing questions across 1,200 carefully selected video/audio/video-audio samples from WebVid, AudioCaps, Auto-ACD, and YouTube. Each sample is paired with two questions targeting the existence of both real existent and non-existent objects or events, ensuring a comprehensive assessment of perception accuracy and hallucination resistance. **Data instructions:** Please download the raw videos in ./reorg_raw_files.zip and the unzipped structure should be: ```bash reorg_raw_files ├── inter-modality_spurious_correlation | ├── audio-language/ | ├── visual-language/ | ├── audio-language/ ├── over-reliance_unimodal_priors | ├── overrely_audio_ignore_visual/ | ├── overrely_visual_ignore_audio/ | ├── overrely_language_ignore_visual/ ``` **Evaluation Instruction:** For detailed evaluation instructions, please refer to our GitHub repo: https://github.com/DAMO-NLP-SG/CMM/. **Dataset date:** CMM was released in Oct 2024. **Paper or resources for more information:** https://github.com/DAMO-NLP-SG/CMM/ **Where to send questions or comments about the model:** https://github.com/DAMO-NLP-SG/CMM/issues ## Citation If you find CMM useful for your research and applications, please cite using this BibTeX: ```bibtex @article{leng2024curse, title={The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio}, author={Sicong Leng and Yun Xing and Zesen Cheng and Yang Zhou and Hang Zhang and Xin Li and Deli Zhao and Shijian Lu and Chunyan Miao and Lidong Bing}, journal={arXiv}, year={2024}, url={https://arxiv.org/abs/2410.12787} } ``` ## Intended use **Primary intended uses:** The primary use of CMM is research on LMMs. **Primary intended users:** The primary intended users of the dataset are researchers and hobbyists in computer vision, natural language processing, audio processing, multi-modal learning, machine learning, and artificial intelligence.

# 多模态诅咒（Curse of Multi-Modalities, CMM）数据集卡片 <p align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/609115c79a8bcaa437b234a9/_fSnc78JKOKmUzD9cLWAu.png" width="75%" height="75%"> </p> ## 数据集详情 **数据集类型：** CMM是一款精心甄选的基准测试集，用于评估大型多模态模型（Large Multi-Modal Models, LMMs）的幻觉漏洞。其构建目标为严格测试多模态模型在视觉、音频与语言模态下的综合能力，重点聚焦跨模态虚假关联与单模态过度依赖所引发的幻觉问题。 **数据集详情：** CMM从WebVid、AudioCaps、Auto-ACD以及YouTube平台中精选了1200个样本，涵盖视频、音频、音视频三类模态，配套生成2400个探查性问题。每个样本均搭配两个问题，分别针对真实存在与虚构的物体或事件进行设问，以全面评估模型的感知准确性与抗幻觉鲁棒性。 **数据获取说明：** 请下载./reorg_raw_files.zip中的原始视频文件，解压后的目录结构应如下： bash reorg_raw_files ├── 跨模态虚假关联（inter-modality_spurious_correlation） | ├── audio-language/ | ├── visual-language/ | ├── audio-language/ ├── 单模态先验过度依赖（over-reliance_unimodal_priors） | ├── overrely_audio_ignore_visual/ | ├── overrely_visual_ignore_audio/ | ├── overrely_language_ignore_visual/ **评估说明：** 详细的评估流程请参考我们的GitHub仓库：https://github.com/DAMO-NLP-SG/CMM/ **数据集发布时间：** CMM于2024年10月正式发布。 **更多信息来源：** https://github.com/DAMO-NLP-SG/CMM/ **数据集反馈渠道：** 请在https://github.com/DAMO-NLP-SG/CMM/issues 提交关于该数据集的疑问或建议。 ## 引用格式如果您的研究或应用中使用了CMM数据集，请使用以下BibTeX格式进行引用： bibtex @article{leng2024curse, title={The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio}, author={Sicong Leng and Yun Xing and Zesen Cheng and Yang Zhou and Hang Zhang and Xin Li and Deli Zhao and Shijian Lu and Chunyan Miao and Lidong Bing}, journal={arXiv}, year={2024}, url={https://arxiv.org/abs/2410.12787} } ## 预期用途 **主要用途：** CMM的核心用途为大型多模态模型相关研究。 **目标用户：** 本数据集的目标用户为计算机视觉、自然语言处理、音频处理、多模态学习、机器学习以及人工智能领域的研究人员与爱好者。

提供机构：

maas

创建时间：

2025-07-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集