five

Irrelevance Robust Visual Question Answering (IR-VQA) (v2)

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/irrelevance-robust-visual-question-answering-ir-vqa-v2
下载链接
链接失效反馈
官方服务:
资源简介:
Large Vision-Language Models (LVLMs) with \multimodal distractibility,\ where plausible but irrelevant visual or textual inputs cause significant drops in reasoning consistency and lead to unreliable outputs. This paper introduces a comprehensive framework to systematically diagnose, evaluate, and mitigate this critical challenge.  We present three core components: the large-scale IR-VQA benchmark to surface these vulnerabilities across four paradigms; novel diagnostic metrics, Positive Consistency (PC) and Negative Consistency (NC), which move beyond standard accuracy to rigorously measure a model's reasoning stability; and the Relevance-Gated Multimodal Routing (RGMR) mechanism, a novel, lightweight module that proactively and dynamically filters distractions at inference time. Our experiments reveal that state-of-the-art models exhibit significant drops in consistency on IR-VQA. We demonstrate that finetuning on IR-VQA and deploying RGMR substantially improve model robustness where standard prompting fails. Our comprehensive analysis of model behaviors under different types of distractions and the underlying reasoning failures provides a clear path forward for developing more reliable multimodal systems.
提供机构:
Jinhui Yang; Qi Zhao; Ming Jiang
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作