UniMM-Chat
收藏魔搭社区2025-11-20 更新2024-06-08 收录
下载链接:
https://modelscope.cn/datasets/thomas/UniMM-Chat
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for UniMM-Chat
## Dataset Summary
UniMM-Chat dataset is an **open-source, knowledge-intensive, and multi-round multimodal dialogue data** powered by GPT-3.5, which consists of **1.1M diverse instructions**.
UniMM-Chat leverages **complementary annotations from different VL datasets** and employs GPT-3.5 to generate multi-turn dialogues corresponding to each image, resulting in **117,238 dialogues**, with an average of **9.89 turns per dialogue**.
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/HQlP6gRsIq9E2czvmunca.png" alt="fig1" width="60%"/>
</p>
**A diverse set of instructions**:
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/8gmR9FWnCjDIs8IQ7ZxpU.png" alt="fig1" width="30%"/>
</p>
**Resulting superior performance in image understanding and reasoning**:
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/YZceD395gErU7FiVVBljE.png" alt="fig1" width="40%"/>
</p>
## Related Sources
- Paper: https://arxiv.org/abs/2310.00653
- Models Trained on UniMM-Chat: 🥞[Muffin](https://github.com/thunlp/muffin), 🏆[RLHF-V](https://rlhf-v.github.io)
## Usage
```python
from datasets import load_dataset
data = load_dataset("Yirany/UniMM-Chat")
```
## Citation
```
@article{yu2023reformulating,
title={Reformulating vision-language foundation models and datasets towards universal multimodal assistants},
author={Yu, Tianyu and Hu, Jinyi and Yao, Yuan and Zhang, Haoye and Zhao, Yue and Wang, Chongyi and Wang, Shan and Pan, Yinxv and Xue, Jiao and Li, Dahai and others},
journal={arXiv preprint arXiv:2310.00653},
year={2023}
}
```
# UniMM-Chat 数据集卡片
## 数据集概述
UniMM-Chat数据集是一款由GPT-3.5驱动的**开源、知识密集型多轮多模态对话数据集**,共包含**110万条多样化指令**。
该数据集利用了不同视觉语言(Vision-Language, VL)数据集的互补标注,并借助GPT-3.5为每张图像生成对应多轮对话,最终得到**117238条对话**,单条对话平均轮次为**9.89轮**。
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/HQlP6gRsIq9E2czvmunca.png" alt="fig1" width="60%"/>
</p>
**多样化指令集**:
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/8gmR9FWnCjDIs8IQ7ZxpU.png" alt="fig1" width="30%"/>
</p>
**在图像理解与推理任务中表现优异**:
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/YZceD395gErU7FiVVBljE.png" alt="fig1" width="40%"/>
</p>
## 相关资源
- 论文:https://arxiv.org/abs/2310.00653
- 基于UniMM-Chat训练的模型:🥞[Muffin](https://github.com/thunlp/muffin)、🏆[RLHF-V](https://rlhf-v.github.io)
## 使用方法
python
from datasets import load_dataset
data = load_dataset("Yirany/UniMM-Chat")
## 引用格式
@article{yu2023reformulating,
title={Reformulating vision-language foundation models and datasets towards universal multimodal assistants},
author={Yu, Tianyu and Hu, Jinyi and Yao, Yuan and Zhang, Haoye and Zhao, Yue and Wang, Chongyi and Wang, Shan and Pan, Yinxv and Xue, Jiao and Li, Dahai and others},
journal={arXiv preprint arXiv:2310.00653},
year={2023}
}
提供机构:
maas
创建时间:
2024-05-26



