UniMM-Chat

Name: UniMM-Chat
Creator: 清华大学
Published: 2023-10-01 20:35:18
License: 暂无描述

arXiv2023-10-01 更新2024-06-21 收录

下载链接：

https://github.com/thunlp/muffin

下载链接

链接失效反馈

官方服务：

资源简介：

UniMM-Chat数据集由清华大学等机构创建，包含117,238个对话，每个对话平均9.89轮。该数据集通过整合来自不同视觉语言数据集的注释，利用ChatGPT生成高质量、多样化的多模态指令。创建过程中，首先从COCO等数据集中提取图像及其多角度注释，然后通过ChatGPT将这些信息转化为知识密集型的对话数据。UniMM-Chat数据集旨在提升多模态大型语言模型在视觉语言任务中的表现，特别是在理解和执行复杂指令方面的能力。

The UniMM-Chat dataset was developed by Tsinghua University and other institutions, containing 117,238 dialogues with an average of 9.89 turns per conversation. This dataset integrates annotations from diverse visual-language datasets and leverages ChatGPT to generate high-quality, diverse multimodal instructions. During its construction, images and their multi-angle annotations were first extracted from datasets including COCO, before being converted into knowledge-intensive dialogue data via ChatGPT. The UniMM-Chat dataset aims to enhance the performance of multimodal large language models on visual-language tasks, particularly their ability to understand and execute complex instructions.

提供机构：

清华大学

创建时间：

2023-10-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集