BLIP3o-60k
收藏魔搭社区2026-05-22 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/BLIP3o/BLIP3o-60k
下载链接
链接失效反馈官方服务:
资源简介:
This is BLIP3o-60k Text-to-Image instruction tuning dataset distilled from GPT-4o, including the following categories:
1. JourneyDB
2. Human (including MSCOCO with human caption, human gestures, occupations)
3. Dalle3
4. Geneval (no overlap with test set)
5. Common objects
6. Simple text
Here we provide the code guidance to download tar file:
```
from huggingface_hub import snapshot_download
snapshot_download(repo_id='BLIP3o/BLIP3o-60k', repo_type=‘dataset’)
```
And you can use huggingface datasets to read the tar file without unzipping them:
```
from datasets import load_dataset
import glob
data_files = glob.glob('/your/datasets/path/*.tar')
train_dataset = load_dataset("webdataset", data_files=data_files, cache_dir='/your/cache/directory/', split="train", num_proc=64)
```
本数据集为从GPT-4o中蒸馏得到的BLIP3o-60k文本到图像(Text-to-Image)指令微调数据集,包含以下类别:
1. JourneyDB
2. 人类(Human):包含带人类标注的MSCOCO(Microsoft Common Objects in Context)数据集、人类手势及职业相关样本
3. Dalle3
4. Geneval:与测试集无重叠
5. 常见物体(Common objects)
6. 简单文本(Simple text)
以下提供下载tar文件的代码指引:
from huggingface_hub import snapshot_download
snapshot_download(repo_id='BLIP3o/BLIP3o-60k', repo_type='dataset')
此外,您可在不解压tar文件的前提下,使用huggingface datasets库读取该数据集:
from datasets import load_dataset
import glob
data_files = glob.glob('/your/datasets/path/*.tar')
train_dataset = load_dataset("webdataset", data_files=data_files, cache_dir='/your/cache/directory/', split="train", num_proc=64)
提供机构:
maas
创建时间:
2025-05-16



