DEMYSTIFYING CLIP DATA
收藏DataCite Commons2026-01-07 更新2026-05-05 收录
下载链接:
https://service.tib.eu/ldmservice/dataset/6f76a23d-3d36-4cee-bea2-00a70415d587
下载链接
链接失效反馈官方服务:
资源简介:
Contrastive Language-Image Pre-training (CLIP) is an approach that has advanced research and applications in computer vision, fueling modern recognition systems and generative models. We believe that the main ingredient to the success of CLIP is its data and not the model architecture or pre-training objective. How-ever, CLIP only provides very limited information about its data and how it has been collected, leading to works that aim to reproduce CLIP’s data by filtering with its model parameters. In this work, we intend to reveal CLIP’s data cura-tion approach and in our pursuit of making it open to the community introduce Metadata-Curated Language-Image Pre-training (MetaCLIP). MetaCLIP takes a raw data pool and metadata (derived from CLIP’s concepts) and yields a balanced subset over the metadata distribution.
对比语言-图像预训练(Contrastive Language-Image Pre-training,CLIP)是一种推动计算机视觉领域研究与应用发展的方法,为现代识别系统与生成式模型的发展提供了核心动力。我们认为,CLIP取得成功的核心要素在于其数据集,而非模型架构或预训练目标。然而,CLIP仅公开了关于其数据集本身及采集流程的极为有限的信息,这催生了诸多尝试通过利用CLIP的模型参数进行筛选,以复现CLIP数据集的研究工作。本研究旨在揭示CLIP的数据集精选方案,并在推动该方案向社区开放的过程中,提出元数据精选对比语言-图像预训练(Metadata-Curated Language-Image Pre-training,MetaCLIP)。MetaCLIP以原始数据集池与(源自CLIP概念体系的)元数据为输入,可基于元数据分布生成均衡的数据集子集。
提供机构:
TIB
创建时间:
2024-12-02



