llava_train_image

Name: llava_train_image
Creator: maas
Published: 2025-11-28 15:42:23
License: 暂无描述

魔搭社区2025-11-28 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/wenxi01/llava_train_image

下载链接

链接失效反馈

官方服务：

资源简介：

# LLaVA Visual Instruct Pretrain Dataset Card ## Dataset details **Dataset type:** LLaVA Visual Instruct Pretrain LCS-558K is a subset of LAION/CC/SBU dataset, filtered with a more balanced concept coverage distribution. Captions are also associated with [BLIP synthetic caption](https://github.com/salesforce/BLIP#pre-training-datasets-download) for reference. It is constructed for the pretraining stage for feature alignment in visual instruction tuning. We aim to build large multimodal towards GPT-4 vision/language capability. **Dataset date:** LLaVA Visual Instruct CC3M Pretrain 595K was created in May 2023. **Dataset structure:** - `blip_laion_cc_sbu_558k.json` contains the multimodal synthesized conversation from the image-caption pairs, by adding randomly selected instructions like: "Describe this image". It is used for pretraining in LLaVA. We use the raw CC-3M caption as the default answer. - `blip_laion_cc_sbu_558k_meta.json` contains the meta data of the image file name, image URL, synthetic BLIP caption. - `images.zip` contains all raw images of the filtered subset from LAION/CC/SBU. Important notice: Upon the request from the community, as ~15% images of the original LAION/CC/SBU dataset are no longer accessible, we upload images.zip for better reproducing our work in research community. It should not be used for any other purpose. The use of these images must comply with the LAION/CC/SBU license. This may be taken down when requested by the original LAION/CC/SBU dataset owner or owners of the referenced images. **Paper or resources for more information:** https://llava-vl.github.io/ **License:** Must comply with license of [CC-3M](https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE), [BLIP](https://github.com/salesforce/BLIP/blob/main/LICENSE.txt) (if you use their synthetic caption). CC-3M The dataset may be freely used for any purpose, although acknowledgement of Google LLC ("Google") as the data source would be appreciated. The dataset is provided "AS IS" without any warranty, express or implied. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. **Where to send questions or comments about the model:** https://github.com/haotian-liu/LLaVA/issues ## Intended use **Primary intended uses:** The primary use of LLaVA is research on large multimodal models and chatbots. **Primary intended users:** The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

# LLaVA视觉指令预训练数据集卡片 ## 数据集详情 **数据集类型：** LLaVA视觉指令预训练LCS-558K是LAION/CC/SBU数据集的子集，经过筛选以实现更均衡的概念覆盖分布。其标注还关联了[BLIP合成标注](https://github.com/salesforce/BLIP#pre-training-datasets-download)以供参考。该数据集专为视觉指令微调阶段的特征对齐任务构建，我们的目标是研发具备GPT-4视觉与语言能力的大型多模态模型。 **数据集发布时间：** LLaVA视觉指令CC3M预训练595K数据集于2023年5月发布。 **数据集结构：** - `blip_laion_cc_sbu_558k.json` 包含由图像-标注对生成的多模态合成对话，通过添加随机选取的指令（如“描述此图像”）构建，用于LLaVA的预训练。我们将原始CC-3M标注作为默认回复。 - `blip_laion_cc_sbu_558k_meta.json` 包含图像文件名、图像URL以及BLIP合成标注的元数据。 - `images.zip` 包含从LAION/CC/SBU筛选出的子集的所有原始图像。重要提示：应社区需求，原始LAION/CC/SBU数据集中约15%的图像已无法访问，我们上传images.zip以助力研究社区复现本研究工作。该压缩包仅可用于研究复现，不得挪作他用。使用这些图像必须遵守LAION/CC/SBU的许可协议。若原始LAION/CC/SBU数据集所有者或相关图像所有者提出要求，该压缩包可能会被移除。 **更多信息的论文或资源：** https://llava-vl.github.io/ **许可协议：** 必须遵守[CC-3M](https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE)以及[BLIP](https://github.com/salesforce/BLIP/blob/main/LICENSE.txt)（若使用其合成标注）的许可协议。 CC-3M 本数据集可免费用于任何用途，尽管我们感谢用户注明Google LLC ("Google") 为数据集来源。本数据集按“现状”提供，不附带任何明示或暗示的担保。Google概不承担因使用本数据集而产生的任何直接或间接损害的全部责任。 **关于本数据集的问题或意见反馈渠道：** https://github.com/haotian-liu/LLaVA/issues ## 预期用途 **主要预期用途：** 本数据集的主要用途为大型多模态模型与聊天机器人的相关研究。 **主要目标用户：** 本数据集的主要目标用户为计算机视觉、自然语言处理、机器学习与人工智能领域的研究人员与爱好者。

提供机构：

maas

创建时间：

2025-10-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集