Multimodal Uncertainty Benchmark (MUB)

arXiv2025-09-30 收录

下载链接：

https://github.com/Yunkai696/MUB

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个旨在评估多模态大型语言模型（MLLMs）在误导性情境下响应不确定性的基准测试。该基准测试包含了显性和隐性的误导性指令，并重点关注了12个开源和5个闭源MLLMs在各个领域的表现。任务目标是评估MLLMs对误导性指令的易受性以及在此类条件下的性能表现。

This dataset is a benchmark designed to evaluate the response uncertainty of multimodal large language models (MLLMs) in misleading scenarios. It encompasses both explicit and implicit misleading instructions, with a focus on the performance of 12 open-source and 5 closed-source MLLMs across various domains. The core objective of this benchmark is to assess the susceptibility of MLLMs to misleading instructions and their performance under such conditions.

5,000+

优质数据集

54 个

任务类型

进入经典数据集