five

scaleinvariant/paired-open-images-embedded-pe-core-g14-448

收藏
Hugging Face2026-03-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/scaleinvariant/paired-open-images-embedded-pe-core-g14-448
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - image-feature-extraction - zero-shot-image-classification tags: - open-images - embeddings - perception-encoder - clip - vision size_categories: - 100K<n<1M --- # Paired Open Images with PE-Core-G14-448 Embeddings This dataset contains **pairs** of images from [Open Images](https://storage.googleapis.com/openimages/web/index.html) along with their embeddings computed using Meta's [Perception Encoder](https://github.com/facebookresearch/perception_models) (`PE-Core-G14-448`). Each row contains two images (as JPEG bytes), their metadata, and their corresponding 1280-dimensional embeddings. ## Data Layout | Column | Description | |--------|-------------| | `image1_jpeg` | JPEG bytes for the first image | | `image1_metadata` | Metadata for the first image | | `image2_jpeg` | JPEG bytes for the second image | | `image2_metadata` | Metadata for the second image | | `image1_embedding0` | PE-Core-G14-448 embedding (dim=1280) for image 1 | | `image2_embedding0` | PE-Core-G14-448 embedding (dim=1280) for image 2 | ## Splits | Split | Files | |-------|-------| | train | 94 parquet shards | | validation | 20 parquet shards | | test | 5 parquet shards | ## Using the Embeddings The embeddings behave like CLIP embeddings. You can use them for zero-shot classification, retrieval, or similarity search. ### Generating text embeddings for comparison ```bash git clone https://github.com/facebookresearch/perception_models.git cd perception_models ``` ```python import torch import core.vision_encoder.pe as pe import core.vision_encoder.transforms as transforms model = pe.CLIP.from_config("PE-Core-G14-448", pretrained=True).cuda().eval() tokenizer = transforms.get_text_tokenizer(model.context_length) text_tokens = tokenizer(["dog", "cat"]).cuda() with torch.no_grad(), torch.autocast("cuda"): _, text_features, _ = model(None, text_tokens) # Compare with stored embeddings via cosine similarity image_features = ... # load from parquet image_features = image_features / image_features.norm(dim=-1, keepdim=True) text_features = text_features / text_features.norm(dim=-1, keepdim=True) similarity = model.logit_scale.exp() * image_features @ text_features.T probs = similarity.softmax(dim=-1) ``` ## License The images originate from Open Images, which is licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).

--- ### 数据集元信息 许可证:CC BY 4.0(知识共享署名4.0) 任务类别: - 图像特征提取 - 零样本图像分类 标签: - 开放图像数据集(Open Images) - 嵌入向量(embeddings) - 感知编码器(Perception Encoder) - CLIP - 视觉 样本规模:10万 < n < 100万 --- # 搭载PE-Core-G14-448嵌入向量的配对开放图像数据集 本数据集包含源自[开放图像数据集(Open Images)](https://storage.googleapis.com/openimages/web/index.html)的图像配对,以及使用Meta旗下[感知编码器(Perception Encoder)](https://github.com/facebookresearch/perception_models)(型号为`PE-Core-G14-448`)计算得到的对应嵌入向量。 每条数据样本包含两张图像(以JPEG字节格式存储)、对应的元数据,以及1280维的嵌入向量。 ## 数据布局 | 列名 | 描述 | |------|------| | `image1_jpeg` | 第一张图像的JPEG字节数据 | | `image1_metadata` | 第一张图像的元数据 | | `image2_jpeg` | 第二张图像的JPEG字节数据 | | `image2_metadata` | 第二张图像的元数据 | | `image1_embedding0` | 第一张图像的PE-Core-G14-448嵌入向量(维度为1280) | | `image2_embedding0` | 第二张图像的PE-Core-G14-448嵌入向量(维度为1280) | ## 数据集拆分 | 拆分名称 | 文件数量 | |----------|----------| | 训练集 | 94个Parquet分片 | | 验证集 | 20个Parquet分片 | | 测试集 | 5个Parquet分片 | ## 嵌入向量的应用 该嵌入向量的使用逻辑与CLIP嵌入向量一致,可应用于零样本分类、图像检索或相似度搜索任务。 ### 生成用于对比的文本嵌入向量 bash git clone https://github.com/facebookresearch/perception_models.git cd perception_models python import torch import core.vision_encoder.pe as pe import core.vision_encoder.transforms as transforms model = pe.CLIP.from_config("PE-Core-G14-448", pretrained=True).cuda().eval() tokenizer = transforms.get_text_tokenizer(model.context_length) text_tokens = tokenizer(["dog", "cat"]).cuda() with torch.no_grad(), torch.autocast("cuda"): _, text_features, _ = model(None, text_tokens) # 通过余弦相似度与存储的嵌入向量进行对比 image_features = ... # 从Parquet文件加载嵌入向量 image_features = image_features / image_features.norm(dim=-1, keepdim=True) text_features = text_features / text_features.norm(dim=-1, keepdim=True) similarity = model.logit_scale.exp() * image_features @ text_features.T probs = similarity.softmax(dim=-1) ## 许可证声明 本数据集内的图像源自开放图像数据集(Open Images),该数据集采用[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)许可证进行授权。
提供机构:
scaleinvariant
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作