five

CMM

收藏
魔搭社区2026-01-06 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/DAMO-NLP-SG/CMM
下载链接
链接失效反馈
官方服务:
资源简介:
# The Curse of Multi-Modalities (CMM) Dataset Card <p align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/609115c79a8bcaa437b234a9/_fSnc78JKOKmUzD9cLWAu.png" width="75%" height="75%"> </p> ## Dataset details **Dataset type:** CMM is a curated benchmark designed to evaluate hallucination vulnerabilities in Large Multi-Modal Models (LMMs). It is constructed to rigorously test LMMs’ capabilities across visual, audio, and language modalities, focusing on hallucinations arising from inter-modality spurious correlations and uni-modal over-reliance. **Dataset detail:** CMM introduces 2,400 probing questions across 1,200 carefully selected video/audio/video-audio samples from WebVid, AudioCaps, Auto-ACD, and YouTube. Each sample is paired with two questions targeting the existence of both real existent and non-existent objects or events, ensuring a comprehensive assessment of perception accuracy and hallucination resistance. **Data instructions:** Please download the raw videos in ./reorg_raw_files.zip and the unzipped structure should be: ```bash reorg_raw_files ├── inter-modality_spurious_correlation | ├── audio-language/ | ├── visual-language/ | ├── audio-language/ ├── over-reliance_unimodal_priors | ├── overrely_audio_ignore_visual/ | ├── overrely_visual_ignore_audio/ | ├── overrely_language_ignore_visual/ ``` **Evaluation Instruction:** For detailed evaluation instructions, please refer to our GitHub repo: https://github.com/DAMO-NLP-SG/CMM/. **Dataset date:** CMM was released in Oct 2024. **Paper or resources for more information:** https://github.com/DAMO-NLP-SG/CMM/ **Where to send questions or comments about the model:** https://github.com/DAMO-NLP-SG/CMM/issues ## Citation If you find CMM useful for your research and applications, please cite using this BibTeX: ```bibtex @article{leng2024curse, title={The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio}, author={Sicong Leng and Yun Xing and Zesen Cheng and Yang Zhou and Hang Zhang and Xin Li and Deli Zhao and Shijian Lu and Chunyan Miao and Lidong Bing}, journal={arXiv}, year={2024}, url={https://arxiv.org/abs/2410.12787} } ``` ## Intended use **Primary intended uses:** The primary use of CMM is research on LMMs. **Primary intended users:** The primary intended users of the dataset are researchers and hobbyists in computer vision, natural language processing, audio processing, multi-modal learning, machine learning, and artificial intelligence.

# 多模态诅咒(Curse of Multi-Modalities, CMM)数据集卡片 <p align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/609115c79a8bcaa437b234a9/_fSnc78JKOKmUzD9cLWAu.png" width="75%" height="75%"> </p> ## 数据集详情 **数据集类型:** CMM是一款精心甄选的基准测试集,用于评估大型多模态模型(Large Multi-Modal Models, LMMs)的幻觉漏洞。其构建目标为严格测试多模态模型在视觉、音频与语言模态下的综合能力,重点聚焦跨模态虚假关联与单模态过度依赖所引发的幻觉问题。 **数据集详情:** CMM从WebVid、AudioCaps、Auto-ACD以及YouTube平台中精选了1200个样本,涵盖视频、音频、音视频三类模态,配套生成2400个探查性问题。每个样本均搭配两个问题,分别针对真实存在与虚构的物体或事件进行设问,以全面评估模型的感知准确性与抗幻觉鲁棒性。 **数据获取说明:** 请下载./reorg_raw_files.zip中的原始视频文件,解压后的目录结构应如下: bash reorg_raw_files ├── 跨模态虚假关联(inter-modality_spurious_correlation) | ├── audio-language/ | ├── visual-language/ | ├── audio-language/ ├── 单模态先验过度依赖(over-reliance_unimodal_priors) | ├── overrely_audio_ignore_visual/ | ├── overrely_visual_ignore_audio/ | ├── overrely_language_ignore_visual/ **评估说明:** 详细的评估流程请参考我们的GitHub仓库:https://github.com/DAMO-NLP-SG/CMM/ **数据集发布时间:** CMM于2024年10月正式发布。 **更多信息来源:** https://github.com/DAMO-NLP-SG/CMM/ **数据集反馈渠道:** 请在https://github.com/DAMO-NLP-SG/CMM/issues 提交关于该数据集的疑问或建议。 ## 引用格式 如果您的研究或应用中使用了CMM数据集,请使用以下BibTeX格式进行引用: bibtex @article{leng2024curse, title={The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio}, author={Sicong Leng and Yun Xing and Zesen Cheng and Yang Zhou and Hang Zhang and Xin Li and Deli Zhao and Shijian Lu and Chunyan Miao and Lidong Bing}, journal={arXiv}, year={2024}, url={https://arxiv.org/abs/2410.12787} } ## 预期用途 **主要用途:** CMM的核心用途为大型多模态模型相关研究。 **目标用户:** 本数据集的目标用户为计算机视觉、自然语言处理、音频处理、多模态学习、机器学习以及人工智能领域的研究人员与爱好者。
提供机构:
maas
创建时间:
2025-07-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作