DataEngine-InstData
收藏魔搭社区2025-09-03 更新2024-06-01 收录
下载链接:
https://modelscope.cn/datasets/Shanghai_AI_Laboratory/DataEngine-InstData
下载链接
链接失效反馈官方服务:
资源简介:
#### 下载方法
:modelscope-code[]{type="git"}
# DataEngine-InstData
## 介绍
DataEngine-InstData是一个视觉问答数据集, 在Visual Genome图片的基础上,使用GPT-4生成,生成方法采用了MLLM-DataEngine提出的缺陷定位-定向补强的迭代生成方法, 其目的是生成更加高质量且针对性的数据,定向对MLLM的能力缺陷进行补强.
## 数据格式
MLLM-DataEngine生成的数据包含了清晰明确的指令(instruction)以及回答(answer)。除此之外,生成的数据还被整理成多项选择题的形式。生成的数据格式如下:
```json
[
{
"instruction": "Where is the man wearing a black backpack positioned in the picture?",
"answer": "The man wearing a black backpack is located at the left side of the image, roughly in the middle between top and bottom",
"short_answer": "Letf middle",
"options": ["Top right", "Bottom right", "Bottom left", "Left middle"],
"choide_answer": "D",
"image": "vg/VG_100K_2/2404787.jpg",
"qtype": 4,
},
...
]
```
```instruction```: 清晰、明确的指令
```answer```: 针对于指令的回答
```short_answer```: 针对于指令的简短回答
```options```: 针对于指令的四个选项(只有一个正确答案)
```choice_answer```: 四个选项中的正确答案
```image```: Visual Genome图像路径
```qtype```: 问题类型,有以下九种问题类型:
```json
{
1: 'Scene Understanding',
2: 'Instance Identity',
3: 'Instance Attributes',
4: 'Instance Location',
5: 'Instances Counting',
6: 'Spatial Relation',
7: 'Instance Interaction',
8: 'Visual Reasoning',
9: 'Text Understanding'
}
```
## Introduction
DataEngine-InstData is a VQA dataset, generated from GPT-4 using Visual Genome images and an iterative data-engine generation process. Its aim is to produce high-quality SFT data, targeted at enhancing specific capabilities of MLLMs.
## Data Format
The MLLM-DataEngine generated data contains a clear, consice instruction, and corresponding answer. Besides, the instruction-answer pair is reformatted into multi-choices question answering format. The generated data is organized in the following format:
```json
[
{
"instruction": "Where is the man wearing a black backpack positioned in the picture?",
"answer": "The man wearing a black backpack is located at the left side of the image, roughly in the middle between top and bottom",
"short_answer": "Letf middle",
"options": ["Top right", "Bottom right", "Bottom left", "Left middle"],
"choide_answer": "D",
"image": "vg/VG_100K_2/2404787.jpg",
"qtype": 4,
},
...
]
```
```instruction```: a clear, consice instruction
```answer```: direct answer to the instruction
```short_answer```: the short answer to the instruction
```options```: four options corresponding to the instruction
```choice_answer```: correct choice answer option
```image```: Visual Genome image path
```qtype```: question type in SEED-Bench, demonstrated in the following:
```json
{
1: 'Scene Understanding',
2: 'Instance Identity',
3: 'Instance Attributes',
4: 'Instance Location',
5: 'Instances Counting',
6: 'Spatial Relation',
7: 'Instance Interaction',
8: 'Visual Reasoning',
9: 'Text Understanding'
}
```
# Citation
```
@misc{zhao2023mllmdataengine,
title={MLLM-DataEngine: An Iterative Refinement Approach for MLLM},
author={Zhiyuan Zhao and Linke Ouyang and Bin Wang and Siyuan Huang and Pan Zhang and Xiaoyi Dong and Jiaqi Wang and Conghui He},
year={2023},
eprint={2308.13566},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
#### 下载方法
:modelscope-code[]{type="git"}
# DataEngine-InstData
## 介绍
DataEngine-InstData是一个视觉问答(Visual Question Answering, VQA)数据集,基于Visual Genome(视觉基因组)图像,由GPT-4生成。其生成方法采用了MLLM-DataEngine提出的缺陷定位-定向补强迭代生成法,旨在生成高质量且具备针对性的监督微调(Supervised Fine-Tuning, SFT)数据,定向补强多模态大语言模型(Multimodal Large Language Model, MLLM)的能力短板。
## 数据格式
MLLM-DataEngine生成的数据包含清晰明确的指令(instruction)与对应回答(answer),同时将指令-回答对重构为多项选择题形式。生成数据的格式如下:
json
[
{
"instruction": "Where is the man wearing a black backpack positioned in the picture?",
"answer": "The man wearing a black backpack is located at the left side of the image, roughly in the middle between top and bottom",
"short_answer": "Letf middle",
"options": ["Top right", "Bottom right", "Bottom left", "Left middle"],
"choide_answer": "D",
"image": "vg/VG_100K_2/2404787.jpg",
"qtype": 4,
},
...
]
instruction: 清晰明确的指令
answer: 对应指令的完整回答
short_answer: 对应指令的简短回答
options: 对应指令的四个备选项(仅含一个正确答案)
choice_answer: 备选项中的正确答案
image: Visual Genome图像路径
qtype: 问题类型,对应SEED-Bench中的九类问题,如下所示:
json
{
1: '场景理解',
2: '实例识别',
3: '实例属性',
4: '实例位置',
5: '实例计数',
6: '空间关系',
7: '实例交互',
8: '视觉推理',
9: '文本理解'
}
# Citation
@misc{zhao2023mllmdataengine,
title={MLLM-DataEngine: An Iterative Refinement Approach for MLLM},
author={Zhiyuan Zhao and Linke Ouyang and Bin Wang and Siyuan Huang and Pan Zhang and Xiaoyi Dong and Jiaqi Wang and Conghui He},
year={2023},
eprint={2308.13566},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
提供机构:
maas
创建时间:
2024-05-28



