MOAT
收藏arXiv2025-09-30 收录
下载链接:
https://cambrian-yzt.github.io/MOAT
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为MOAT,是一个包含多种复杂现实世界视觉-语言任务的多样化基准,旨在评估大型多模态模型(LMMs)的能力,并分析人类与LMM之间的性能差距。MOAT涵盖了一个包含10种基本视觉-语言能力分类的框架,并通过对LMMs和人类参与者的评估来衡量性能差异。该数据集的规模为189个问题,任务内容涉及评估多模态模型在视觉-语言任务上的表现,这些任务要求具备诸如阅读文本、计数、理解空间关系和执行指令等基本能力。
The dataset named MOAT is a diverse benchmark encompassing a wide range of complex real-world vision-language tasks, designed to evaluate the capabilities of Large Multimodal Models (LMMs) and analyze the performance gap between humans and LMMs. MOAT incorporates a framework that classifies 10 fundamental vision-language capabilities, and measures performance disparities through evaluations conducted on both LMMs and human participants. This dataset consists of 189 questions, with task contents focusing on assessing the performance of multimodal models on vision-language tasks that require basic abilities such as reading text, counting, understanding spatial relationships, and following instructions.



