LLaVA-Pretrain
收藏魔搭社区2026-05-16 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/LLaVA-Pretrain
下载链接
链接失效反馈官方服务:
资源简介:
# LLaVA Visual Instruct Pretrain Dataset Card
## Dataset details
**Dataset type:**
LLaVA Visual Instruct Pretrain LCS-558K is a subset of LAION/CC/SBU dataset, filtered with a more balanced concept coverage distribution.
Captions are also associated with [BLIP synthetic caption](https://github.com/salesforce/BLIP#pre-training-datasets-download) for reference.
It is constructed for the pretraining stage for feature alignment in visual instruction tuning.
We aim to build large multimodal towards GPT-4 vision/language capability.
**Dataset date:**
LLaVA Visual Instruct CC3M Pretrain 595K was created in May 2023.
**Dataset structure:**
- `blip_laion_cc_sbu_558k.json` contains the multimodal synthesized conversation from the image-caption pairs, by adding randomly selected instructions like: "Describe this image". It is used for pretraining in LLaVA. We use the raw CC-3M caption as the default answer.
- `blip_laion_cc_sbu_558k_meta.json` contains the meta data of the image file name, image URL, synthetic BLIP caption.
- `images.zip` contains all raw images of the filtered subset from LAION/CC/SBU. Important notice: Upon the request from the community, as ~15% images of the original LAION/CC/SBU dataset are no longer accessible, we upload images.zip for better reproducing our work in research community. It should not be used for any other purpose. The use of these images must comply with the LAION/CC/SBU license. This may be taken down when requested by the original LAION/CC/SBU dataset owner or owners of the referenced images.
**Paper or resources for more information:**
https://llava-vl.github.io/
**License:**
Must comply with license of [CC-3M](https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE), [BLIP](https://github.com/salesforce/BLIP/blob/main/LICENSE.txt) (if you use their synthetic caption).
CC-3M
The dataset may be freely used for any purpose, although acknowledgement of
Google LLC ("Google") as the data source would be appreciated. The dataset is
provided "AS IS" without any warranty, express or implied. Google disclaims all
liability for any damages, direct or indirect, resulting from the use of the
dataset.
**Where to send questions or comments about the model:**
https://github.com/haotian-liu/LLaVA/issues
## Intended use
**Primary intended uses:**
The primary use of LLaVA is research on large multimodal models and chatbots.
**Primary intended users:**
The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
# LLaVA视觉指令预训练数据集卡片
## 数据集详情
**数据集类型:**
LLaVA视觉指令预训练LCS-558K是LAION/CC/SBU数据集的子集,经过筛选以实现更均衡的概念覆盖分布。字幕还关联了[BLIP合成字幕(BLIP synthetic caption)](https://github.com/salesforce/BLIP#pre-training-datasets-download)以供参考。本数据集专为视觉指令微调中的特征对齐预训练阶段构建,我们的目标是打造具备GPT-4视觉-语言能力的大型多模态模型。
**数据集发布日期:**
LLaVA视觉指令CC3M预训练595K数据集创建于2023年5月。
**数据集结构:**
- `blip_laion_cc_sbu_558k.json` 存储了图像-字幕对生成的多模态合成对话,通过添加随机选取的指令(如“描述该图像”)生成内容,用于LLaVA的预训练流程,我们将原始CC-3M字幕作为默认回复。
- `blip_laion_cc_sbu_558k_meta.json` 存储了图像文件名、图像URL以及BLIP合成字幕的元数据。
- `images.zip` 包含了从LAION/CC/SBU筛选出的子集中所有原始图像。重要提示:应社区要求,由于原始LAION/CC/SBU数据集中约15%的图像已无法访问,我们上传`images.zip`以方便研究社区复现我们的工作。本压缩包仅用于学术研究复现,不得用于其他用途。使用这些图像必须遵守LAION/CC/SBU的许可协议。若原始LAION/CC/SBU数据集所有者或相关图像所有者提出要求,本压缩包可能会被下架。
**更多信息的论文或资源:**
https://llava-vl.github.io/
**许可协议:**
必须遵守[CC-3M](https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE)和[BLIP](https://github.com/salesforce/BLIP/blob/main/LICENSE.txt)(若使用其合成字幕)的许可协议。
CC-3M
本数据集可免费用于任何用途,若能注明数据来源为谷歌有限责任公司(Google LLC,简称“谷歌”)将不胜感激。本数据集按“现状”提供,不附带任何明示或暗示的担保。谷歌对因使用本数据集而产生的任何直接或间接损害不承担任何责任。
**关于本数据集的问题或意见请发送至:**
https://github.com/haotian-liu/LLaVA/issues
## 预期用途
**主要用途:**
本数据集的主要用途为开展大型多模态模型与聊天机器人相关的研究。
**主要目标用户:**
本数据集的主要目标用户为计算机视觉、自然语言处理、机器学习以及人工智能领域的研究人员与爱好者。
提供机构:
maas
创建时间:
2024-05-22



