Mantis-Instruct
收藏魔搭社区2026-05-07 更新2024-06-08 收录
下载链接:
https://modelscope.cn/datasets/swift/Mantis-Instruct
下载链接
链接失效反馈官方服务:
资源简介:
# Mantis-Instruct
[Paper](https://arxiv.org/abs/2405.01483) | [Website](https://tiger-ai-lab.github.io/Mantis/) | [Github](https://github.com/TIGER-AI-Lab/Mantis) | [Models](https://huggingface.co/collections/TIGER-Lab/mantis-6619b0834594c878cdb1d6e4) | [Demo](https://huggingface.co/spaces/TIGER-Lab/Mantis)
## Summaries
Mantis-Instruct is a fully text-image interleaved multimodal instruction tuning dataset,
containing 721K examples from 14 subsets and covering multi-image skills including co-reference, reasoning, comparing, temporal understanding.
**It's been used to train Mantis Model families**
- Mantis-Instruct has a total of **721K instances**, consisting of **14 subsets** to cover all the multi-image skills.
- Among the 14 subsets, 10 subsets are from the existing datasets. For example, NLVR2, IconQA, etc for reasoning skill; DreamSim, Birds-to-Words, etc for comparison skill; NExT-QA, STAR, for temporal understanding
- We additionally curate four new datasets LLaVA-665k-multi, LRV-multi to cover coref skill and Contrast-Caption, Multi-VQA to broaden reasoning skill, where Multi-VQA is generated by prompting GPT-4.

## Loading dataset
- to load the dataset without automatically downloading and process the images
```python
import datasets
dataset = datasets.load_dataset("TIGER-Lab/Mantis-Instruct", "multi_vqa") # revision is 'main' by default
# dataset['train'][0]['images']: image paths relative to the text file, change it to the valid path on your local machine.
```
In this case, you need to manually download the image zips from the [`revision`](https://huggingface.co/datasets/TIGER-Lab/Mantis-Instruct/tree/script) branch of this repo for each subset, and set the prepend the directory of the images.
- to load the dataset that automatically downloads and process the images (**Please run the following codes with datasets==2.18.0** )
```python
import datasets
dataset = datasets.load_dataset("TIGER-Lab/Mantis-Instruct", "multi_vqa", revision="script")
# dataset['train'][0]['images']: processed absolution valid path of the downloaded images on your local machine
```
- to load all the subsets of the images
```python
from datasets import get_dataset_config_names, load_dataset
config_dataset = {}
for config_name in get_dataset_config_names():
config_dataset[config_name] = load_dataset("TIGER-Lab/Mantis-Instruct", config_name)
```
- to load all the subsets of the images, with automatically downloading
```python
from datasets import get_dataset_config_names, load_dataset
config_dataset = {}
for config_name in get_dataset_config_names():
config_dataset[config_name] = load_dataset("TIGER-Lab/Mantis-Instruct", config_name, revision="script")
```
## Citation
```
@article{Jiang2024MANTISIM,
title={MANTIS: Interleaved Multi-Image Instruction Tuning},
author={Dongfu Jiang and Xuan He and Huaye Zeng and Cong Wei and Max W.F. Ku and Qian Liu and Wenhu Chen},
journal={Transactions on Machine Learning Research},
year={2024},
volume={2024},
url={https://openreview.net/forum?id=skLtdUVaJa}
}
```
# Mantis-Instruct
[论文](https://arxiv.org/abs/2405.01483) | [官网](https://tiger-ai-lab.github.io/Mantis/) | [GitHub仓库](https://github.com/TIGER-AI-Lab/Mantis) | [模型集合](https://huggingface.co/collections/TIGER-Lab/mantis-6619b0834594c878cdb1d6e4) | [在线演示](https://huggingface.co/spaces/TIGER-Lab/Mantis)
## 数据集概述
Mantis-Instruct是一个纯文本-图像交错式多模态指令微调数据集(multimodal instruction tuning dataset),包含来自14个子集的72.1万个样本,覆盖共指推理、逻辑推理、对比分析、时序理解等多图像技能。**该数据集已用于训练Mantis模型系列**
- Mantis-Instruct总计包含**72.1万个实例**,由**14个子集**构成,覆盖全部多图像技能。
- 在这14个子集中,10个源自现有数据集:例如用于推理技能的NLVR2、IconQA等;用于对比技能的DreamSim、Birds-to-Words等;用于时序理解的NExT-QA、STAR等。
- 我们额外构建了4个全新数据集:LLaVA-665k-multi、LRV-multi用于覆盖共指技能,Contrast-Caption、Multi-VQA用于拓展推理技能,其中Multi-VQA通过提示GPT-4生成。

## 数据集加载
### 无自动下载处理的加载方式
如需在不自动下载并处理图像的情况下加载数据集:
python
import datasets
dataset = datasets.load_dataset("TIGER-Lab/Mantis-Instruct", "multi_vqa") # 修订版本默认为主分支(main)
# dataset['train'][0]['images']:图像路径为相对于文本文件的相对路径,请将其修改为本地机器上的有效路径。
在此场景下,你需要手动从本仓库的[`script`分支](https://huggingface.co/datasets/TIGER-Lab/Mantis-Instruct/tree/script)下载每个子集对应的图像压缩包,并将图像目录前缀添加至路径中。
### 自动下载处理图像的加载方式
如需以自动下载并处理图像的方式加载数据集(**请确保使用datasets==2.18.0版本运行以下代码**):
python
import datasets
dataset = datasets.load_dataset("TIGER-Lab/Mantis-Instruct", "multi_vqa", revision="script")
# dataset['train'][0]['images']:已处理为本地下载图像的绝对有效路径。
### 加载所有子集的图像数据(无自动下载)
如需加载所有子集的图像数据:
python
from datasets import get_dataset_config_names, load_dataset
config_dataset = {}
for config_name in get_dataset_config_names():
config_dataset[config_name] = load_dataset("TIGER-Lab/Mantis-Instruct", config_name)
### 自动下载并加载所有子集的图像数据
如需自动下载并加载所有子集的图像数据:
python
from datasets import get_dataset_config_names, load_dataset
config_dataset = {}
for config_name in get_dataset_config_names():
config_dataset[config_name] = load_dataset("TIGER-Lab/Mantis-Instruct", config_name, revision="script")
## 引用信息
bibtex
@article{Jiang2024MANTISIM,
title={MANTIS: Interleaved Multi-Image Instruction Tuning},
author={Dongfu Jiang and Xuan He and Huaye Zeng and Cong Wei and Max W.F. Ku and Qian Liu and Wenhu Chen},
journal={Transactions on Machine Learning Research},
year={2024},
volume={2024},
url={https://openreview.net/forum?id=skLtdUVaJa}
}
提供机构:
maas
创建时间:
2024-06-05



