MMRA

Name: MMRA
Creator: maas
Published: 2025-12-05 16:17:24
License: 暂无描述

魔搭社区2025-12-05 更新2024-10-05 收录

下载链接：

https://modelscope.cn/datasets/m-a-p/MMRA

下载链接

链接失效反馈

官方服务：

资源简介：

# Introduction We define a multi-image relation association task, and meticulously curate **MMRA** benchmark, a **M**ulti-granularity **M**ulti-image **R**elational **A**ssociation benchmark, consisted of **1,024** samples. In order to systematically and comprehensively evaluate mainstream LVLMs, we establish an associational relation system among images that contain **11 subtasks** (e.g, UsageSimilarity, SubEvent, etc.) at two granularity levels (i.e., "**image**" and "**entity**") according to the relations in ConceptNet. Our experiments reveal that on the MMRA benchmark, current multi-image LVLMs exhibit distinct advantages and disadvantages across various subtasks. Notably, fine-grained, entity-level multi-image perception tasks pose a greater challenge for LVLMs compared to image-level tasks. Tasks that involve spatial perception are especially difficult for LVLMs to handle. Additionally, our findings indicate that while LVLMs demonstrate a strong capability to perceive image details, enhancing their ability to associate information across multiple images hinges on improving the reasoning capabilities of their language model component. Moreover, we explored the ability of LVLMs to perceive image sequences within the context of our multi-image association task. Our experiments indicate that the majority of current LVLMs do not adequately model image sequences during the pre-training process. ![framework](./imgs/framework.png) ![main_result](./imgs/main_result.png) --- # Evaluateion Codes The codes of this paper can be found in our [GitHub](https://github.com/Wusiwei0410/MMRA/tree/main) --- # Using Datasets You can load our datasets by following codes: ```python MMRA_data = datasets.load_dataset('m-a-p/MMRA')['train'] print(MMRA_data[0]) ``` --- # Citation BibTeX: ``` @article{wu2024mmra, title={MMRA: A Benchmark for Multi-granularity Multi-image Relational Association}, author={Wu, Siwei and Zhu, Kang and Bai, Yu and Liang, Yiming and Li, Yizhi and Wu, Haoning and Liu, Jiaheng and Liu, Ruibo and Qu, Xingwei and Cheng, Xuxin and others}, journal={arXiv preprint arXiv:2407.17379}, year={2024} } ```

# 引言我们定义了多图像关联任务，并精心构建了**MMRA**基准数据集，即**多粒度多图像关联基准测试集（Multi-granularity Multi-image Relational Association Benchmark）**，该数据集共计包含1024个样本。为系统且全面地评估主流视觉大语言模型（Large Vision Language Model, LVLM），我们基于ConceptNet中的关联关系，构建了一套图像间关联关系体系。该体系在"图像"与"实体"两个粒度层面下共包含11个子任务（例如使用相似性、子事件等）。我们的实验结果表明，在MMRA基准测试集上，当前的多图像视觉大语言模型在不同子任务上表现出显著的优劣差异。值得注意的是，相较于图像级任务，细粒度的实体级多图像感知任务对视觉大语言模型而言更具挑战性；涉及空间感知的任务则更是视觉大语言模型难以处理的难点。此外，我们的研究发现，尽管视觉大语言模型在感知图像细节方面能力较强，但提升其跨多图像的信息关联能力，关键在于优化其语言模型组件的推理性能。不仅如此，我们还在本次多图像关联任务中探究了视觉大语言模型对图像序列的感知能力。实验结果显示，当前绝大多数视觉大语言模型在预训练阶段并未充分建模图像序列信息。 ![框架示意图](./imgs/framework.png) ![主实验结果图](./imgs/main_result.png) --- # 评估代码本文的代码可在我们的[GitHub仓库](https://github.com/Wusiwei0410/MMRA/tree/main)中获取。 --- # 数据集使用方法您可通过以下代码加载本数据集： python MMRA_data = datasets.load_dataset('m-a-p/MMRA')['train'] print(MMRA_data[0]) --- # 引用格式 BibTeX 引用格式： @article{wu2024mmra, title={MMRA: A Benchmark for Multi-granularity Multi-image Relational Association}, author={Wu, Siwei and Zhu, Kang and Bai, Yu and Liang, Yiming and Li, Yizhi and Wu, Haoning and Liu, Jiaheng and Liu, Ruibo and Qu, Xingwei and Cheng, Xuxin and others}, journal={arXiv preprint arXiv:2407.17379}, year={2024} }

提供机构：

maas

创建时间：

2024-09-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集