neulab/PangeaInstruct
收藏Hugging Face2025-02-02 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/neulab/PangeaInstruct
下载链接
链接失效反馈官方服务:
资源简介:
PangeaIns是一个包含39种语言的多语言、多文化的多模态指令调优数据集,总样本量为6450624。该数据集用于训练Pangea-7B模型,涵盖了多种数据来源,包括ALLAVA-4V、allava_vflan、Cambrian737k等。数据集遵循LLaVA数据格式,图像数据以压缩格式(如.tar或.zip)提供,需解压后使用。
PangeaIns is a 6M multilingual multicultural multimodal instruction tuning dataset spanning 39 languages. It was utilized during the instruction tuning phase for the Pangea-7B model. The dataset includes various data sources such as ALLAVA-4V, allava_vflan, Cambrian737k, and others. It follows the LLaVA data format, and image data is provided in compressed formats like .tar or .zip, which need to be extracted for use.
提供机构:
neulab



