Open-Bee/Bee-Training-Data-Stage2
收藏Hugging Face2026-03-10 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/Open-Bee/Bee-Training-Data-Stage2
下载链接
链接失效反馈官方服务:
资源简介:
Bee-Training-Data-Stage2是Bee-8B多模态大型语言模型训练的第二阶段数据集,基于高质量的Honey-Data-15M语料库,使用了HoneyPipe管道进行数据清洗和增强。该数据集用于图像到文本的任务,包含大约1500万样本,是一个全开源的数据集。
Bee-Training-Data-Stage2 is the second stage dataset for training the Bee-8B multimodal large language model, based on the high-quality Honey-Data-15M corpus and enhanced with the HoneyPipe data cleaning and enrichment pipeline. This dataset is used for image-to-text tasks, containing approximately 15 million samples, and is fully open-source.
提供机构:
Open-Bee



