apple/DFNDR-12M-bf16
收藏Hugging Face2026-04-27 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/apple/DFNDR-12M-bf16
下载链接
链接失效反馈官方服务:
资源简介:
DFNDR-12M-BFloat16是一个图像-文本数据集,包含合成标题、嵌入和元数据。它基于DFN-12M,这是从DFN-2B中均匀采样的12.8M样本子集。数据集使用了两个更强的DFN教师模型(DFN2B-CLIP-ViT-L-14和DFN2B-CLIP-ViT-L-14-39B)和改进的合成标题生成方法(MobileCLIP2-CoCa-ViT-L-14)。对于DFNDR-12M,应用了30种随机图像增强(DFNDR-2B为2种)。计算了教师模型在增强图像、真实标题和合成标题上的嵌入。嵌入是1536维的向量,由两个768维向量拼接而成。每个样本包括一个随机增强的图像、一个真实标题和一个随机选取的合成标题。这是数据集的BFloat16版本,嵌入以压缩的.pth.gz格式存储,精度为BFloat16。
DFNDR-12M-BFloat16 is an image-text dataset containing synthetic captions, embeddings, and metadata. It is based on DFN-12M, a uniformly sampled subset of 12.8M samples from DFN-2B. The dataset uses an ensemble of two stronger DFN teachers (DFN2B-CLIP-ViT-L-14 and DFN2B-CLIP-ViT-L-14-39B) and improved synthetic captions generated by MobileCLIP2-CoCa-ViT-L-14. For DFNDR-12M, 30 strong random image augmentations are applied (2 for DFNDR-2B). Embeddings of the teacher ensemble on augmented images, real captions, and synthetic captions are computed. Embeddings are 1536-D concatenations of 2x768-D vectors. One sample consists of one randomly augmented image, one ground-truth caption, and one randomly picked synthetic caption. This is the BFloat16 version of the dataset, with embeddings stored in compressed .pth.gz format with BFloat16 precision.
提供机构:
apple



