five

BoyaWu10/Bunny-v1_1-data

收藏
Hugging Face2024-07-01 更新2024-07-06 收录
下载链接:
https://hf-mirror.com/datasets/BoyaWu10/Bunny-v1_1-data
下载链接
链接失效反馈
官方服务:
资源简介:
Bunny-v1.1数据集是用于训练Bunny-v1.1和Bunny-v1.0系列模型的训练数据集,包括Bunny-v1.1-Llama-3-8B-V和Bunny-v1.1-4B。该数据集使用了一个高质量的核心集进行预训练,该核心集来自LAION-2B,减少了重复样本并增加了信息丰富的样本。在微调阶段,数据集结合了SVIT-mix-665K、LLaVA-665K和ALLaVA-Instruct-4V等多个数据集,形成了Bunny-LLaVA-1.4M、Bunny-ALLaVA-1.3M和Bunny-LLaVA-ALLaVA-2M等微调数据集。数据集中的图像被打包成多个包,用户需要下载并合并这些包后才能使用。

The Bunny-v1.1 dataset is the training dataset for both the Bunny-v1.1 and Bunny-v1.0 series models, including Bunny-v1.1-Llama-3-8B-V and Bunny-v1.1-4B. This dataset includes pretraining and finetuning data. The pretraining data consists of 2 million randomly sampled image-text pairs from a high-quality coreset of LAION-2B, which is the same as the data in Bunny-v1.0-data. The finetuning data includes Bunny-695K, which is built by modifying SVIT-mix-665K, and is combined with LLaVA-665K and ALLaVA-Instruct-4V to form Bunny-LLaVA-1.4M, Bunny-ALLaVA-1.3M, and Bunny-LLaVA-ALLaVA-2M. The images in the dataset are packed into multiple parts and need to be merged and unpacked after downloading.
提供机构:
BoyaWu10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作