Synthetic Visual Instruction Set

Name: Synthetic Visual Instruction Set
Creator: Generated by the authors using CogVLM-17B
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/DCDmllm/Align2LLaVA

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是从MSCOCO数据集中生成的，包含了158,000张图片，并使用了CogVLM-17B模型来生成机器生成的多模态指令。尽管该数据集被压缩至原始大小的9%，但在模型性能上却保持或有所提升。具体来说，158,000张图片在模型训练中被缩减至14,000张，任务涵盖了指令遵循和多模态理解。

This dataset is derived from the MSCOCO dataset, comprising 158,000 images, and leverages the CogVLM-17B model to generate machine-generated multimodal instructions. Despite being compressed to just 9% of its original size, it maintains or even enhances model performance. Specifically, the 158,000 images are reduced to 14,000 for model training, with tasks covering instruction following and multimodal understanding.

提供机构：

Generated by the authors using CogVLM-17B

5,000+

优质数据集

54 个

任务类型

进入经典数据集