LLaVA-OneVision-Data

Hugging Face2024-08-07 更新2024-12-12 收录

下载链接：

https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含多个配置，每个配置都有特定的名称和特征。主要特征包括ID、图像和对话，其中对话由发送者和内容组成。数据集支持多种语言，包括英语和中文，并且每个配置都有训练数据的大小和示例数量。

This dataset includes multiple configurations, each with a specific name and characteristics. Its main features consist of ID, images, and dialogues, where a dialogue is composed of a sender and its corresponding content. The dataset supports multiple languages including English and Chinese, and each configuration is accompanied by the size of the training data and the number of examples.

创建时间：

2024-07-25

原始信息汇总

LLaVA-OneVision-Data 数据集概述

基本信息

语言: 英语和中文
许可证: Apache 2.0
数据集名称: llava-onevision-data

数据集配置详情

CLEVR-Math(MathV360K)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 791346970
  - 样本数: 5280
下载大小: 441208499
数据集大小: 791346970

FigureQA(MathV360K)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 463326576.625
  - 样本数: 17587
下载大小: 258197193
数据集大小: 463326576.625

GEOS(MathV360K)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 1503641
  - 样本数: 498
下载大小: 684471
数据集大小: 1503641

GeoQA+(MathV360K)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 53579705.75
  - 样本数: 17162
下载大小: 33480538
数据集大小: 53579705.75

Geometry3K(MathV360K)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 218085473.5
  - 样本数: 9724
下载大小: 125914780
数据集大小: 218085473.5

IconQA(MathV360K)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 208430568.375
  - 样本数: 22589
下载大小: 117222488
数据集大小: 208430568.375

MapQA(MathV360K)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 384120915.875
  - 样本数: 5225
下载大小: 215768443
数据集大小: 384120915.875

PMC-VQA(MathV360K)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 571444866.5
  - 样本数: 35948
下载大小: 326541003
数据集大小: 571444866.5

Super-CLEVR(MathV360K)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 2795082410.75
  - 样本数: 8642
下载大小: 1580301917
数据集大小: 2795082410.75

TabMWP(MathV360K)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 307726997.5
  - 样本数: 22452
下载大小: 173938487
数据集大小: 307726997.5

UniGeo(MathV360K)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 38296693.375
  - 样本数: 11949
下载大小: 24170743
数据集大小: 38296693.375

VizWiz(MathV360K)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 1170333936.5
  - 样本数: 6604
下载大小: 660752297
数据集大小: 1170333936.5

ai2d(cauldron,llava_format)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 438572782.375
  - 样本数: 2429
下载大小: 437348514
数据集大小: 438572782.375

ai2d(gpt4v)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 866076731
  - 样本数: 4864
下载大小: 860306578
数据集大小: 866076731

ai2d(internvl)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 1832787249.625
  - 样本数: 12403
下载大小: 527493895
数据集大小: 1832787249.625

allava_instruct_laion4v

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 5981767621.25
  - 样本数: 49990
下载大小: 5873046236
数据集大小: 5981767621.25

allava_instruct_vflan4v

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 2680974558.25
  - 样本数: 19990
下载大小: 2670088751
数据集大小: 2680974558.25

aokvqa(cauldron,llava_format)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 6896420844.25
  - 样本数: 16534
下载大小: 6894236970
数据集大小: 6896420844.25

chart2text(cauldron)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 1145458729.5
  - 样本数: 26956
下载大小: 1123681047
数据集大小: 1145458729.5

chartqa(cauldron,llava_format)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 815335215.5
  - 样本数: 18260
下载大小: 803084541
数据集大小: 815335215.5

chrome_writting

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 44422597.875
  - 样本数: 8825
下载大小: 39611257
数据集大小: 44422597.875

clevr(cauldron,llava_format)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 10528974543.625
  - 样本数: 69995
下载大小: 10460536445
数据集大小: 10528974543.625

diagram_image_to_text(cauldron)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 18858266
  - 样本数: 295
下载大小: 18659115
数据集大小: 18858266

dvqa(cauldron,llava_format)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 4487270615.625
  - 样本数: 199995
下载大小: 4277056467
数据集大小: 4487270615.625

figureqa(cauldron,llava_format)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 2351194509.625
  - 样本数: 99995
下载大小: 2222640639
数据集大小: 2351194509.625

geo170k(align)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 204236256.75
  - 样本数: 60242
下载大小: 58185410
数据集大小: 204236256.75

geo170k(qa)

特征:
- id: 字符串
- image: 图像
- conversations: 列表
  - from: 字符串
  - value: 字符串
- data_source: 字符串
分割:
- train:
  - 字节数: 266040519.125
  - 样本数: 67823
下载大小: 160022430
数据集大小:

搜集汇总

数据集介绍

构建方式

LLaVA-OneVision-Data数据集的构建基于多模态学习的需求，涵盖了图像与文本的交互场景。该数据集通过整合多个子数据集，如CLEVR-Math、FigureQA、GEOS等，构建了一个丰富的多模态对话数据集。每个子数据集均包含图像和与之相关的对话文本，对话内容围绕图像展开，涵盖了数学、几何、图表等多种领域。数据集的构建过程注重数据的多样性和复杂性，确保能够支持多模态模型的训练与评估。

特点

LLaVA-OneVision-Data数据集的特点在于其多模态性和广泛的应用场景。数据集中的每个样本包含图像和对话文本，对话内容围绕图像展开，涉及数学推理、视觉问答、图表理解等多种任务。数据集的多样性体现在其涵盖了从简单几何问题到复杂图表分析的广泛领域，能够为多模态模型提供丰富的训练数据。此外，数据集还提供了详细的数据来源信息，便于研究者追踪数据的原始出处。

使用方法

LLaVA-OneVision-Data数据集的使用方法主要围绕多模态模型的训练与评估展开。研究者可以通过加载数据集中的图像和对话文本，构建多模态输入，训练视觉-语言联合模型。数据集适用于多种任务，如视觉问答、图像描述生成、图表理解等。使用过程中，研究者可以根据具体任务选择相应的子数据集，或通过组合多个子数据集进行跨领域研究。数据集的详细分割信息也为模型的训练、验证和测试提供了便利。

背景与挑战

背景概述

LLaVA-OneVision-Data数据集是一个多模态数据集，专注于视觉与语言结合的复杂任务，涵盖了从数学推理到地理信息分析等多个领域。该数据集由多个子集组成，如CLEVR-Math、FigureQA、GEOS等，每个子集都针对特定的视觉问答任务进行了优化。数据集的创建时间不详，但其设计旨在推动多模态模型在复杂视觉推理任务中的表现，尤其是在需要结合图像和文本信息的场景中。该数据集的出现为视觉问答、图像理解和多模态学习等领域提供了丰富的研究资源，推动了相关技术的进步。

当前挑战

LLaVA-OneVision-Data数据集面临的挑战主要体现在两个方面。首先，数据集所解决的领域问题涉及复杂的视觉推理任务，例如数学问题的图像化表达、地理信息的视觉问答等，这些任务要求模型具备高度的多模态理解能力，能够同时处理图像和文本信息。其次，在数据集的构建过程中，如何确保图像与文本之间的高质量对齐是一个关键挑战。由于数据来源多样，图像和文本的标注质量可能存在差异，这对数据集的整体一致性和模型的训练效果提出了更高的要求。此外，数据集的规模庞大，如何高效地存储、处理和分发这些数据也是一个技术难题。

常用场景

经典使用场景

LLaVA-OneVision-Data数据集在视觉问答（VQA）和图像理解领域具有广泛的应用。其经典使用场景包括通过图像与文本的交互，训练多模态模型以理解和回答与图像内容相关的问题。该数据集特别适用于需要结合视觉和语言信息的任务，如数学问题的图像解答、地理图像的问答等。

衍生相关工作

基于LLaVA-OneVision-Data数据集，研究者们开发了多种多模态模型，如LLaVA和GPT-4V等。这些模型在视觉问答、图像生成和跨模态理解任务中表现出色。此外，该数据集还催生了一系列相关研究，如基于图像的数学问题求解、地理图像问答系统等，推动了多模态学习领域的发展。

数据集最近研究