ShareGPT4V 大规模高质量图文数据集

超神经2024-06-07 更新2024-06-29 收录

下载链接：

https://hyper.ai/cn/datasets/32313

下载链接

链接失效反馈

官方服务：

资源简介：

ShareGPT4V 数据集是一个由大量图像-文本对组成的高质量数据集，它被用于训练视觉-语言模型 (VLM），以提高模型在图像理解和文本生成方面的能力。该数据集包含 120 万对图像-文本配对，这些数据有效地对齐了视觉和语言特征，增强了模型遵循指令的能力，并纳入了更多学术任务，例如 ScienceQA 、 TextVQA 、 SBU 等。通过引入这个数据集，模型在图像-文本对齐能力方面得到了显著提升，这对于多模态表示学习是一个关键方面。

The ShareGPT4V dataset is a high-quality dataset composed of a large number of image-text pairs. It is employed to train vision-language models (VLMs) to enhance their capabilities in image understanding and text generation. This dataset contains 1.2 million image-text pairs, which effectively align visual and language features, strengthen the model's instruction-following ability, and incorporates a diverse range of academic tasks such as ScienceQA, TextVQA, SBU, and others. By utilizing this dataset, the model's image-text alignment capability has been significantly improved, which represents a critical aspect of multimodal representation learning.

创建时间：

2024-06-06

搜集汇总

数据集介绍

背景与挑战

背景概述

ShareGPT4V是一个大规模高质量的图文数据集，包含120万对图像-文本配对，专门用于训练视觉-语言模型以提升图像理解和文本生成能力。该数据集通过有效对齐视觉和语言特征，增强了模型遵循指令的能力，并涵盖了ScienceQA、TextVQA等学术任务，由中国科学技术大学和上海人工智能实验室于2023年发布。

以上内容由遇见数据集搜集并总结生成