MMCBench
收藏arXiv2024-01-22 更新2024-06-21 收录
下载链接:
https://github.com/sail-sg/MMCBench
下载链接
链接失效反馈官方服务:
资源简介:
MMCBench是由Sea AI Lab创建的一个全面基准,用于评估超过100种流行的大型多模态模型(LMMs)在面对文本、图像和语音交互中的四种基本生成任务时的自我一致性。该数据集包含超过150个模型检查点,旨在通过彻底的评估,促进对尖端LMMs可靠性的更好理解。MMCBench特别关注于测量模型输出在遭受常见损坏时的自我一致性,为多模态模型的实际部署提供了关键的评估工具。
MMCBench is a comprehensive benchmark created by Sea AI Lab for evaluating the self-consistency of over 100 popular large multimodal models (LMMs) across four fundamental generation tasks in text, image, and speech interactions. The dataset encompasses more than 150 model checkpoints, aiming to foster a better understanding of the reliability of cutting-edge LMMs through thorough evaluations. MMCBench specifically focuses on measuring the self-consistency of model outputs when exposed to common corruptions, providing a critical assessment tool for the real-world deployment of multimodal models.
提供机构:
Sea AI Lab
创建时间:
2024-01-22



