Synthetic Visual Instruction Set
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/DCDmllm/Align2LLaVA
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是从MSCOCO数据集中生成的,包含了158,000张图片,并使用了CogVLM-17B模型来生成机器生成的多模态指令。尽管该数据集被压缩至原始大小的9%,但在模型性能上却保持或有所提升。具体来说,158,000张图片在模型训练中被缩减至14,000张,任务涵盖了指令遵循和多模态理解。
This dataset is derived from the MSCOCO dataset, comprising 158,000 images, and leverages the CogVLM-17B model to generate machine-generated multimodal instructions. Despite being compressed to just 9% of its original size, it maintains or even enhances model performance. Specifically, the 158,000 images are reduced to 14,000 for model training, with tasks covering instruction following and multimodal understanding.
提供机构:
Generated by the authors using CogVLM-17B



