Mantis-Eval

Name: Mantis-Eval
Creator: maas
Published: 2026-01-07 22:40:19
License: 暂无描述

魔搭社区2026-01-07 更新2024-06-01 收录

下载链接：

https://modelscope.cn/datasets/TIGER-Lab/Mantis-Eval

下载链接

链接失效反馈

官方服务：

资源简介：

## Overview This is a newly curated dataset to evaluate multimodal language models' capability to reason over multiple images. More details are shown in https://tiger-ai-lab.github.io/Mantis/. ### Statistics This evaluation dataset contains 217 human-annotated challenging multi-image reasoning problems. ### Leaderboard We list the current results as follows: | Models | Size | Mantis-Eval | |:------------------|:-----|:------------| | LLaVA OneVision | 72B | 77.60 | | LLaVA OneVision | 7B | 64.20 | | GPT-4V | - | 62.67 | | Mantis-SigLIP | 8B | 59.45 | | Mantis-Idefics2 | 8B | 57.14 | | Mantis-CLIP | 8B | 55.76 | | VILA | 8B | 51.15 | | BLIP-2 | 13B | 49.77 | | Idefics2 | 8B | 48.85 | | InstructBLIP | 13B | 45.62 | | LLaVA-V1.6 | 7B | 45.62 | | CogVLM | 17B | 45.16 | | LLaVA OneVision | 0.5B | 39.60 | | Qwen-VL-Chat | 7B | 39.17 | | Emu2-Chat | 37B | 37.79 | | VideoLLaVA | 7B | 35.04 | | Mantis-Flamingo | 9B | 32.72 | | LLaVA-v1.5 | 7B | 31.34 | | Kosmos2 | 1.6B | 30.41 | | Idefics1 | 9B | 28.11 | | Fuyu | 8B | 27.19 | | OpenFlamingo | 9B | 12.44 | | Otter-Image | 9B | 14.29 | ### Citation If you are using this dataset, please cite our work with ``` @article{Jiang2024MANTISIM, title={MANTIS: Interleaved Multi-Image Instruction Tuning}, author={Dongfu Jiang and Xuan He and Huaye Zeng and Cong Wei and Max W.F. Ku and Qian Liu and Wenhu Chen}, journal={Transactions on Machine Learning Research}, year={2024}, volume={2024}, url={https://openreview.net/forum?id=skLtdUVaJa} } ```

## 概述本数据集为全新精心构建的评测数据集，用于评估多模态语言模型针对多幅图像的推理能力。更多详细信息请访问：https://tiger-ai-lab.github.io/Mantis/。 ### 统计信息本评测数据集共包含217道经人工标注的高难度多图像推理问题。 ### 排行榜当前模型评测结果如下表所示： | 模型 | 参数量 | Mantis-Eval 得分 | |:--------------------|:-------|:----------------| | LLaVA OneVision | 72B | 77.60 | | LLaVA OneVision | 7B | 64.20 | | GPT-4V | - | 62.67 | | Mantis-SigLIP | 8B | 59.45 | | Mantis-Idefics2 | 8B | 57.14 | | Mantis-CLIP | 8B | 55.76 | | VILA | 8B | 51.15 | | BLIP-2 | 13B | 49.77 | | Idefics2 | 8B | 48.85 | | InstructBLIP | 13B | 45.62 | | LLaVA-V1.6 | 7B | 45.62 | | CogVLM | 17B | 45.16 | | LLaVA OneVision | 0.5B | 39.60 | | Qwen-VL-Chat | 7B | 39.17 | | Emu2-Chat | 37B | 37.79 | | VideoLLaVA | 7B | 35.04 | | Mantis-Flamingo | 9B | 32.72 | | LLaVA-v1.5 | 7B | 31.34 | | Kosmos2 | 1.6B | 30.41 | | Idefics1 | 9B | 28.11 | | Fuyu | 8B | 27.19 | | OpenFlamingo | 9B | 12.44 | | Otter-Image | 9B | 14.29 | ### 引用若您使用本数据集，请通过以下文献引用我们的工作： @article{Jiang2024MANTISIM, title={MANTIS: 交错式多图像指令微调}, author={Dongfu Jiang and Xuan He and Huaye Zeng and Cong Wei and Max W.F. Ku and Qian Liu and Wenhu Chen}, journal={机器学习研究汇刊}, year={2024}, volume={2024}, url={https://openreview.net/forum?id=skLtdUVaJa} }

提供机构：

maas

创建时间：

2024-05-29

搜集汇总

数据集介绍