DavidNguyen/LLAVA-LibMoE

Name: DavidNguyen/LLAVA-LibMoE
Creator: DavidNguyen
Published: 2026-04-29 21:44:36
License: 暂无描述

Hugging Face2026-04-29 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/DavidNguyen/LLAVA-LibMoE

下载链接

链接失效反馈

官方服务：

资源简介：

LLAVA-LibMoE是一个大规模多模态数据集，包含LLaVA-665K和OneVision-1M2图像源，用于图像到文本、文本生成和问答等自然语言处理任务。数据集由多个子数据集组成，包括COCO、GQA、OCR-VQA、TextVQA和Visual Genome，提供丰富的图像和文本对。语言为英语，规模在10亿到100亿之间，适用于训练和评估混合专家模型（Mixture of Experts）在大型语言模型中的应用。

LLAVA-LibMoE is a large-scale multimodal dataset that includes LLaVA-665K and OneVision-1M2 image sources, designed for tasks such as image-to-text, text generation, and question answering. The dataset comprises multiple sub-datasets, including COCO, GQA, OCR-VQA, TextVQA, and Visual Genome, offering a diverse collection of image-text pairs. It is in English, with a size category between 1 billion and 10 billion, and is intended for training and evaluating Mixture of Experts in Large Language Models.

提供机构：

DavidNguyen

5,000+

优质数据集

54 个

任务类型

进入经典数据集