ShareGPT4V 大规模高质量图文数据集

超神经2024-06-07 更新2024-06-29 收录

下载链接：

https://hyper.ai/cn/datasets/32313

下载链接

链接失效反馈

资源简介：

ShareGPT4V 数据集是一个由大量图像-文本对组成的高质量数据集，它被用于训练视觉-语言模型 (VLM），以提高模型在图像理解和文本生成方面的能力。该数据集包含 120 万对图像-文本配对，这些数据有效地对齐了视觉和语言特征，增强了模型遵循指令的能力，并纳入了更多学术任务，例如 ScienceQA 、 TextVQA 、 SBU 等。通过引入这个数据集，模型在图像-文本对齐能力方面得到了显著提升，这对于多模态表示学习是一个关键方面。

The ShareGPT4V dataset is a high-quality corpus composed of numerous image-text pairs, which is utilized for training vision-language models (VLMs) to enhance their capabilities in image understanding and text generation. This dataset contains 1.2 million image-text pairs, which effectively align visual and linguistic features, strengthen the models' instruction-following ability, and incorporate a wide range of academic tasks such as ScienceQA, TextVQA, SBU, and so on. By introducing this dataset, models have achieved substantial improvements in image-text alignment capabilities, which is a critical aspect of multimodal representation learning.

创建时间：

2024-06-06

AI搜集汇总

数据集介绍

背景与挑战

背景概述

ShareGPT4V是一个大规模高质量的图文数据集，包含120万对图像-文本配对，专门用于训练视觉-语言模型以提升图像理解和文本生成能力。该数据集通过有效对齐视觉和语言特征，增强了模型遵循指令的能力，并涵盖了ScienceQA、TextVQA等学术任务，由中国科学技术大学和上海人工智能实验室于2023年发布。

以上内容由AI搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集