scaleinvariant/paired-open-images-embedded-pe-core-g14-448

Name: scaleinvariant/paired-open-images-embedded-pe-core-g14-448
Creator: scaleinvariant
Published: 2026-03-09 08:11:58
License: 暂无描述

Hugging Face2026-03-09 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/scaleinvariant/paired-open-images-embedded-pe-core-g14-448

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - image-feature-extraction - zero-shot-image-classification tags: - open-images - embeddings - perception-encoder - clip - vision size_categories: - 100K<n<1M --- # Paired Open Images with PE-Core-G14-448 Embeddings This dataset contains **pairs** of images from [Open Images](https://storage.googleapis.com/openimages/web/index.html) along with their embeddings computed using Meta's [Perception Encoder](https://github.com/facebookresearch/perception_models) (`PE-Core-G14-448`). Each row contains two images (as JPEG bytes), their metadata, and their corresponding 1280-dimensional embeddings. ## Data Layout | Column | Description | |--------|-------------| | `image1_jpeg` | JPEG bytes for the first image | | `image1_metadata` | Metadata for the first image | | `image2_jpeg` | JPEG bytes for the second image | | `image2_metadata` | Metadata for the second image | | `image1_embedding0` | PE-Core-G14-448 embedding (dim=1280) for image 1 | | `image2_embedding0` | PE-Core-G14-448 embedding (dim=1280) for image 2 | ## Splits | Split | Files | |-------|-------| | train | 94 parquet shards | | validation | 20 parquet shards | | test | 5 parquet shards | ## Using the Embeddings The embeddings behave like CLIP embeddings. You can use them for zero-shot classification, retrieval, or similarity search. ### Generating text embeddings for comparison ```bash git clone https://github.com/facebookresearch/perception_models.git cd perception_models ``` ```python import torch import core.vision_encoder.pe as pe import core.vision_encoder.transforms as transforms model = pe.CLIP.from_config("PE-Core-G14-448", pretrained=True).cuda().eval() tokenizer = transforms.get_text_tokenizer(model.context_length) text_tokens = tokenizer(["dog", "cat"]).cuda() with torch.no_grad(), torch.autocast("cuda"): _, text_features, _ = model(None, text_tokens) # Compare with stored embeddings via cosine similarity image_features = ... # load from parquet image_features = image_features / image_features.norm(dim=-1, keepdim=True) text_features = text_features / text_features.norm(dim=-1, keepdim=True) similarity = model.logit_scale.exp() * image_features @ text_features.T probs = similarity.softmax(dim=-1) ``` ## License The images originate from Open Images, which is licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).

--- ### 数据集元信息许可证：CC BY 4.0（知识共享署名4.0）任务类别： - 图像特征提取 - 零样本图像分类标签： - 开放图像数据集（Open Images） - 嵌入向量（embeddings） - 感知编码器（Perception Encoder） - CLIP - 视觉样本规模：10万 < n < 100万 --- # 搭载PE-Core-G14-448嵌入向量的配对开放图像数据集本数据集包含源自[开放图像数据集（Open Images）](https://storage.googleapis.com/openimages/web/index.html)的图像配对，以及使用Meta旗下[感知编码器（Perception Encoder）](https://github.com/facebookresearch/perception_models)（型号为`PE-Core-G14-448`）计算得到的对应嵌入向量。每条数据样本包含两张图像（以JPEG字节格式存储）、对应的元数据，以及1280维的嵌入向量。 ## 数据布局 | 列名 | 描述 | |------|------| | `image1_jpeg` | 第一张图像的JPEG字节数据 | | `image1_metadata` | 第一张图像的元数据 | | `image2_jpeg` | 第二张图像的JPEG字节数据 | | `image2_metadata` | 第二张图像的元数据 | | `image1_embedding0` | 第一张图像的PE-Core-G14-448嵌入向量（维度为1280） | | `image2_embedding0` | 第二张图像的PE-Core-G14-448嵌入向量（维度为1280） | ## 数据集拆分 | 拆分名称 | 文件数量 | |----------|----------| | 训练集 | 94个Parquet分片 | | 验证集 | 20个Parquet分片 | | 测试集 | 5个Parquet分片 | ## 嵌入向量的应用该嵌入向量的使用逻辑与CLIP嵌入向量一致，可应用于零样本分类、图像检索或相似度搜索任务。 ### 生成用于对比的文本嵌入向量 bash git clone https://github.com/facebookresearch/perception_models.git cd perception_models python import torch import core.vision_encoder.pe as pe import core.vision_encoder.transforms as transforms model = pe.CLIP.from_config("PE-Core-G14-448", pretrained=True).cuda().eval() tokenizer = transforms.get_text_tokenizer(model.context_length) text_tokens = tokenizer(["dog", "cat"]).cuda() with torch.no_grad(), torch.autocast("cuda"): _, text_features, _ = model(None, text_tokens) # 通过余弦相似度与存储的嵌入向量进行对比 image_features = ... # 从Parquet文件加载嵌入向量 image_features = image_features / image_features.norm(dim=-1, keepdim=True) text_features = text_features / text_features.norm(dim=-1, keepdim=True) similarity = model.logit_scale.exp() * image_features @ text_features.T probs = similarity.softmax(dim=-1) ## 许可证声明本数据集内的图像源自开放图像数据集（Open Images），该数据集采用[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)许可证进行授权。

提供机构：

scaleinvariant

5,000+

优质数据集

54 个

任务类型

进入经典数据集