five

"Evaluating MLLMs with Multimodal Multi-image Reasoning Benchmark"

收藏
DataCite Commons2025-09-21 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/evaluating-mllms-multimodal-multi-image-reasoning-benchmark
下载链接
链接失效反馈
官方服务:
资源简介:
"With enhanced capabilities and widespread applications, Multimodal Large Language Models (MLLMs) are increasingly required to process and reason over multiple images simultaneously. However, existing MLLM benchmarks focus either on single-image visual reasoning or on multi-image understanding tasks with only final-answer evaluation, leaving the reasoning capabilities of MLLMs over multi-image inputs largely underexplored. To address this gap, we introduce the \\textbf{Multimodal Multi-image Reasoning Benchmark (MMRB)}, the first benchmark designed to evaluate structured visual reasoning across multiple images. MMRB comprises \\textbf{92 sub-tasks} covering spatial, temporal, and semantic reasoning, with multi-solution, CoT-style annotations generated by GPT-4o and refined by human experts. A derivative subset is designed to evaluate multimodal reward models in multi-image scenarios. To support fast and scalable evaluation, we propose a sentence-level matching framework using open-source LLMs. Extensive baseline experiments on \\textbf{40 MLLMs}, including 9 reasoning-specific models and 8 reward models, demonstrate that open-source MLLMs still lag significantly behind commercial MLLMs in multi-image reasoning tasks. Furthermore, current multimodal reward models are nearly incapable of handling multi-image reward ranking tasks. "
提供机构:
IEEE DataPort
创建时间:
2025-09-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作