Multimodal Uncertainty Benchmark (MUB)
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/Yunkai696/MUB
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个旨在评估多模态大型语言模型(MLLMs)在误导性情境下响应不确定性的基准测试。该基准测试包含了显性和隐性的误导性指令,并重点关注了12个开源和5个闭源MLLMs在各个领域的表现。任务目标是评估MLLMs对误导性指令的易受性以及在此类条件下的性能表现。
This dataset is a benchmark designed to evaluate the response uncertainty of multimodal large language models (MLLMs) in misleading scenarios. It encompasses both explicit and implicit misleading instructions, with a focus on the performance of 12 open-source and 5 closed-source MLLMs across various domains. The core objective of this benchmark is to assess the susceptibility of MLLMs to misleading instructions and their performance under such conditions.



