apple/DFNDR-12M-bf16

Name: apple/DFNDR-12M-bf16
Creator: apple
Published: 2026-04-27 14:44:08
License: 暂无描述

Hugging Face2026-04-27 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/apple/DFNDR-12M-bf16

下载链接

链接失效反馈

官方服务：

资源简介：

DFNDR-12M-BFloat16是一个图像-文本数据集，包含合成标题、嵌入和元数据。它基于DFN-12M，这是从DFN-2B中均匀采样的12.8M样本子集。数据集使用了两个更强的DFN教师模型（DFN2B-CLIP-ViT-L-14和DFN2B-CLIP-ViT-L-14-39B）和改进的合成标题生成方法（MobileCLIP2-CoCa-ViT-L-14）。对于DFNDR-12M，应用了30种随机图像增强（DFNDR-2B为2种）。计算了教师模型在增强图像、真实标题和合成标题上的嵌入。嵌入是1536维的向量，由两个768维向量拼接而成。每个样本包括一个随机增强的图像、一个真实标题和一个随机选取的合成标题。这是数据集的BFloat16版本，嵌入以压缩的.pth.gz格式存储，精度为BFloat16。

DFNDR-12M-BFloat16 is an image-text dataset containing synthetic captions, embeddings, and metadata. It is based on DFN-12M, a uniformly sampled subset of 12.8M samples from DFN-2B. The dataset uses an ensemble of two stronger DFN teachers (DFN2B-CLIP-ViT-L-14 and DFN2B-CLIP-ViT-L-14-39B) and improved synthetic captions generated by MobileCLIP2-CoCa-ViT-L-14. For DFNDR-12M, 30 strong random image augmentations are applied (2 for DFNDR-2B). Embeddings of the teacher ensemble on augmented images, real captions, and synthetic captions are computed. Embeddings are 1536-D concatenations of 2x768-D vectors. One sample consists of one randomly augmented image, one ground-truth caption, and one randomly picked synthetic caption. This is the BFloat16 version of the dataset, with embeddings stored in compressed .pth.gz format with BFloat16 precision.

提供机构：

apple

5,000+

优质数据集

54 个

任务类型

进入经典数据集