Mantis-Eval
收藏魔搭社区2026-01-07 更新2024-06-01 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/Mantis-Eval
下载链接
链接失效反馈官方服务:
资源简介:
## Overview
This is a newly curated dataset to evaluate multimodal language models' capability to reason over multiple images. More details are shown in https://tiger-ai-lab.github.io/Mantis/.
### Statistics
This evaluation dataset contains 217 human-annotated challenging multi-image reasoning problems.
### Leaderboard
We list the current results as follows:
| Models | Size | Mantis-Eval |
|:------------------|:-----|:------------|
| LLaVA OneVision | 72B | 77.60 |
| LLaVA OneVision | 7B | 64.20 |
| GPT-4V | - | 62.67 |
| Mantis-SigLIP | 8B | 59.45 |
| Mantis-Idefics2 | 8B | 57.14 |
| Mantis-CLIP | 8B | 55.76 |
| VILA | 8B | 51.15 |
| BLIP-2 | 13B | 49.77 |
| Idefics2 | 8B | 48.85 |
| InstructBLIP | 13B | 45.62 |
| LLaVA-V1.6 | 7B | 45.62 |
| CogVLM | 17B | 45.16 |
| LLaVA OneVision | 0.5B | 39.60 |
| Qwen-VL-Chat | 7B | 39.17 |
| Emu2-Chat | 37B | 37.79 |
| VideoLLaVA | 7B | 35.04 |
| Mantis-Flamingo | 9B | 32.72 |
| LLaVA-v1.5 | 7B | 31.34 |
| Kosmos2 | 1.6B | 30.41 |
| Idefics1 | 9B | 28.11 |
| Fuyu | 8B | 27.19 |
| OpenFlamingo | 9B | 12.44 |
| Otter-Image | 9B | 14.29 |
### Citation
If you are using this dataset, please cite our work with
```
@article{Jiang2024MANTISIM,
title={MANTIS: Interleaved Multi-Image Instruction Tuning},
author={Dongfu Jiang and Xuan He and Huaye Zeng and Cong Wei and Max W.F. Ku and Qian Liu and Wenhu Chen},
journal={Transactions on Machine Learning Research},
year={2024},
volume={2024},
url={https://openreview.net/forum?id=skLtdUVaJa}
}
```
## 概述
本数据集为全新精心构建的评测数据集,用于评估多模态语言模型针对多幅图像的推理能力。更多详细信息请访问:https://tiger-ai-lab.github.io/Mantis/。
### 统计信息
本评测数据集共包含217道经人工标注的高难度多图像推理问题。
### 排行榜
当前模型评测结果如下表所示:
| 模型 | 参数量 | Mantis-Eval 得分 |
|:--------------------|:-------|:----------------|
| LLaVA OneVision | 72B | 77.60 |
| LLaVA OneVision | 7B | 64.20 |
| GPT-4V | - | 62.67 |
| Mantis-SigLIP | 8B | 59.45 |
| Mantis-Idefics2 | 8B | 57.14 |
| Mantis-CLIP | 8B | 55.76 |
| VILA | 8B | 51.15 |
| BLIP-2 | 13B | 49.77 |
| Idefics2 | 8B | 48.85 |
| InstructBLIP | 13B | 45.62 |
| LLaVA-V1.6 | 7B | 45.62 |
| CogVLM | 17B | 45.16 |
| LLaVA OneVision | 0.5B | 39.60 |
| Qwen-VL-Chat | 7B | 39.17 |
| Emu2-Chat | 37B | 37.79 |
| VideoLLaVA | 7B | 35.04 |
| Mantis-Flamingo | 9B | 32.72 |
| LLaVA-v1.5 | 7B | 31.34 |
| Kosmos2 | 1.6B | 30.41 |
| Idefics1 | 9B | 28.11 |
| Fuyu | 8B | 27.19 |
| OpenFlamingo | 9B | 12.44 |
| Otter-Image | 9B | 14.29 |
### 引用
若您使用本数据集,请通过以下文献引用我们的工作:
@article{Jiang2024MANTISIM,
title={MANTIS: 交错式多图像指令微调},
author={Dongfu Jiang and Xuan He and Huaye Zeng and Cong Wei and Max W.F. Ku and Qian Liu and Wenhu Chen},
journal={机器学习研究汇刊},
year={2024},
volume={2024},
url={https://openreview.net/forum?id=skLtdUVaJa}
}
提供机构:
maas
创建时间:
2024-05-29
搜集汇总
数据集介绍

背景与挑战
背景概述
Mantis-Eval是一个用于评估多模态语言模型在多图像推理能力上的数据集,包含217个挑战性问题,并提供了多个模型的性能比较。
以上内容由遇见数据集搜集并总结生成



