Multipanel Visual Question Answering (MultipanelVQA)
收藏arXiv2025-09-30 收录
下载链接:
https://sites.google.com/view/multipanelvqa/home
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含6600个问题、答案和多面板图像的三元组,旨在挑战模型对多面板图像的理解能力。该基准测试的特点是使用了合成生成的多面板图像,这些图像经过精心设计,旨在隔离并评估各种因素(如布局)对多面板图像理解模型的认知能力的影响。该任务的目的是进行视觉问题回答。
This dataset includes 6,600 triplets composed of questions, answers, and multi-panel images, designed to challenge models' capacity for multi-panel image comprehension. This benchmark features synthetically generated multi-panel images that are deliberately engineered to isolate and assess the impact of diverse factors (such as layout) on the cognitive abilities of models specialized in multi-panel image understanding. The core objective of this task is visual question answering.



