scaleinvariant/paired-open-images-embedded-pe-core-g14-448
收藏Hugging Face2026-03-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/scaleinvariant/paired-open-images-embedded-pe-core-g14-448
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- image-feature-extraction
- zero-shot-image-classification
tags:
- open-images
- embeddings
- perception-encoder
- clip
- vision
size_categories:
- 100K<n<1M
---
# Paired Open Images with PE-Core-G14-448 Embeddings
This dataset contains **pairs** of images from [Open Images](https://storage.googleapis.com/openimages/web/index.html) along with their embeddings computed using Meta's [Perception Encoder](https://github.com/facebookresearch/perception_models) (`PE-Core-G14-448`).
Each row contains two images (as JPEG bytes), their metadata, and their corresponding 1280-dimensional embeddings.
## Data Layout
| Column | Description |
|--------|-------------|
| `image1_jpeg` | JPEG bytes for the first image |
| `image1_metadata` | Metadata for the first image |
| `image2_jpeg` | JPEG bytes for the second image |
| `image2_metadata` | Metadata for the second image |
| `image1_embedding0` | PE-Core-G14-448 embedding (dim=1280) for image 1 |
| `image2_embedding0` | PE-Core-G14-448 embedding (dim=1280) for image 2 |
## Splits
| Split | Files |
|-------|-------|
| train | 94 parquet shards |
| validation | 20 parquet shards |
| test | 5 parquet shards |
## Using the Embeddings
The embeddings behave like CLIP embeddings. You can use them for zero-shot classification, retrieval, or similarity search.
### Generating text embeddings for comparison
```bash
git clone https://github.com/facebookresearch/perception_models.git
cd perception_models
```
```python
import torch
import core.vision_encoder.pe as pe
import core.vision_encoder.transforms as transforms
model = pe.CLIP.from_config("PE-Core-G14-448", pretrained=True).cuda().eval()
tokenizer = transforms.get_text_tokenizer(model.context_length)
text_tokens = tokenizer(["dog", "cat"]).cuda()
with torch.no_grad(), torch.autocast("cuda"):
_, text_features, _ = model(None, text_tokens)
# Compare with stored embeddings via cosine similarity
image_features = ... # load from parquet
image_features = image_features / image_features.norm(dim=-1, keepdim=True)
text_features = text_features / text_features.norm(dim=-1, keepdim=True)
similarity = model.logit_scale.exp() * image_features @ text_features.T
probs = similarity.softmax(dim=-1)
```
## License
The images originate from Open Images, which is licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).
---
### 数据集元信息
许可证:CC BY 4.0(知识共享署名4.0)
任务类别:
- 图像特征提取
- 零样本图像分类
标签:
- 开放图像数据集(Open Images)
- 嵌入向量(embeddings)
- 感知编码器(Perception Encoder)
- CLIP
- 视觉
样本规模:10万 < n < 100万
---
# 搭载PE-Core-G14-448嵌入向量的配对开放图像数据集
本数据集包含源自[开放图像数据集(Open Images)](https://storage.googleapis.com/openimages/web/index.html)的图像配对,以及使用Meta旗下[感知编码器(Perception Encoder)](https://github.com/facebookresearch/perception_models)(型号为`PE-Core-G14-448`)计算得到的对应嵌入向量。
每条数据样本包含两张图像(以JPEG字节格式存储)、对应的元数据,以及1280维的嵌入向量。
## 数据布局
| 列名 | 描述 |
|------|------|
| `image1_jpeg` | 第一张图像的JPEG字节数据 |
| `image1_metadata` | 第一张图像的元数据 |
| `image2_jpeg` | 第二张图像的JPEG字节数据 |
| `image2_metadata` | 第二张图像的元数据 |
| `image1_embedding0` | 第一张图像的PE-Core-G14-448嵌入向量(维度为1280) |
| `image2_embedding0` | 第二张图像的PE-Core-G14-448嵌入向量(维度为1280) |
## 数据集拆分
| 拆分名称 | 文件数量 |
|----------|----------|
| 训练集 | 94个Parquet分片 |
| 验证集 | 20个Parquet分片 |
| 测试集 | 5个Parquet分片 |
## 嵌入向量的应用
该嵌入向量的使用逻辑与CLIP嵌入向量一致,可应用于零样本分类、图像检索或相似度搜索任务。
### 生成用于对比的文本嵌入向量
bash
git clone https://github.com/facebookresearch/perception_models.git
cd perception_models
python
import torch
import core.vision_encoder.pe as pe
import core.vision_encoder.transforms as transforms
model = pe.CLIP.from_config("PE-Core-G14-448", pretrained=True).cuda().eval()
tokenizer = transforms.get_text_tokenizer(model.context_length)
text_tokens = tokenizer(["dog", "cat"]).cuda()
with torch.no_grad(), torch.autocast("cuda"):
_, text_features, _ = model(None, text_tokens)
# 通过余弦相似度与存储的嵌入向量进行对比
image_features = ... # 从Parquet文件加载嵌入向量
image_features = image_features / image_features.norm(dim=-1, keepdim=True)
text_features = text_features / text_features.norm(dim=-1, keepdim=True)
similarity = model.logit_scale.exp() * image_features @ text_features.T
probs = similarity.softmax(dim=-1)
## 许可证声明
本数据集内的图像源自开放图像数据集(Open Images),该数据集采用[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)许可证进行授权。
提供机构:
scaleinvariant



