OpenGVLab/Mono-InternVL-2B-Synthetic-Data
收藏Hugging Face2025-07-22 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/OpenGVLab/Mono-InternVL-2B-Synthetic-Data
下载链接
链接失效反馈官方服务:
资源简介:
Mono-InternVL-2B合成数据集用于Mono-InternVL-2B模型的S1.2阶段训练,包含2.58亿图像的简短字幕,这些图像来源于Laion-2B、Coyo-700M和SAM(en)。数据集共有259,064,832条记录,分布在3,072个JSONL文件中,每个文件包含84,331条记录。
The Mono-InternVL-2B Synthetic Data dataset is used for the training of the S1.2 stage of the Mono-InternVL-2B model. It consists of short captions for 258 million images sampled from Laion-2B, Coyo-700M, and SAM(en). The dataset contains a total of 259,064,832 records, spread across 3,072 JSONL files, with each file containing 84,331 records.
提供机构:
OpenGVLab



