Irrelevance Robust Visual Question Answering (IR-VQA) (v2)

Name: Irrelevance Robust Visual Question Answering (IR-VQA) (v2)
Creator: Jinhui Yang; Qi Zhao; Ming Jiang
License: 暂无描述

IEEE2026-04-17 收录

下载链接：

https://ieee-dataport.org/documents/irrelevance-robust-visual-question-answering-ir-vqa-v2

下载链接

链接失效反馈

官方服务：

资源简介：

Large Vision-Language Models (LVLMs) with \multimodal distractibility,\ where plausible but irrelevant visual or textual inputs cause significant drops in reasoning consistency and lead to unreliable outputs. This paper introduces a comprehensive framework to systematically diagnose, evaluate, and mitigate this critical challenge.  We present three core components: the large-scale IR-VQA benchmark to surface these vulnerabilities across four paradigms; novel diagnostic metrics, Positive Consistency (PC) and Negative Consistency (NC), which move beyond standard accuracy to rigorously measure a model's reasoning stability; and the Relevance-Gated Multimodal Routing (RGMR) mechanism, a novel, lightweight module that proactively and dynamically filters distractions at inference time. Our experiments reveal that state-of-the-art models exhibit significant drops in consistency on IR-VQA. We demonstrate that finetuning on IR-VQA and deploying RGMR substantially improve model robustness where standard prompting fails. Our comprehensive analysis of model behaviors under different types of distractions and the underlying reasoning failures provides a clear path forward for developing more reliable multimodal systems.

提供机构：

Jinhui Yang; Qi Zhao; Ming Jiang

5,000+

优质数据集

54 个

任务类型

进入经典数据集