apple/DFNDR-2B
收藏Hugging Face2026-04-27 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/apple/DFNDR-2B
下载链接
链接失效反馈官方服务:
资源简介:
DFNDR-2B是一个图像-文本数据集,包含合成标题、嵌入和元数据。这些元数据是通过在DFN-2B(DataComp-12B的一个2B过滤子集)上使用预训练的图像-文本模型生成的。DFNDR-2B基于DFN-2B构建,使用了两个更强的DFN教师模型和改进的合成标题。数据集的结构包括图像URL、合成标题、增强参数和嵌入(图像嵌入、文本嵌入和合成标题嵌入)。数据集的使用目的是提高训练效率,与标准CLIP训练相比,DFNDR-2B的训练效率提高了1.7倍。数据集由DataComp和Apple共同策划,许可证为CC-BY-NC-ND-4.0。
DFNDR-2B is an image-text dataset containing synthetic captions, embeddings, and metadata. The metadata has been generated using pretrained image-text models on DFN-2B, a 2B filtered subset of DataComp-12B. DFNDR-2B builds upon DFN-2B using an ensemble of two stronger DFN teachers and improved synthetic captions. The dataset structure includes image URLs, synthetic captions, augmentation parameters, and embeddings (image embeddings, text embeddings, and synthetic caption embeddings). The dataset is intended to improve training efficiency, showing up to 1.7x more efficiency compared to standard CLIP training. The dataset is curated by DataComp and Apple, and is licensed under CC-BY-NC-ND-4.0.
提供机构:
apple



