Meldashti/vlm-compositionality-embeddings
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Meldashti/vlm-compositionality-embeddings
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- zero-shot-classification
- image-classification
tags:
- compositionality
- vision-language-models
- hyperbolic-geometry
- GDE
- CZSL
- group-robustness
- embeddings
pretty_name: VLM Compositionality Embeddings
size_categories:
- 100K<n<1M
---
# VLM Compositionality Embeddings
Pre-computed image and text embeddings for the thesis **"From Euclidean to Hyperbolic Vision-Language Spaces: A Study of Attribute–Object Compositionality"** by Meelad Dashti (Politecnico di Torino & University of Twente, 2026).
Code repository: [github.com/MelDashti/hyperbolic-vlm-compositionality](https://github.com/MelDashti/hyperbolic-vlm-compositionality)
## Models
| Model | Geometry | Architecture | Training Data |
|-------|----------|-------------|---------------|
| **CLIP ViT-L/14** | Spherical | ViT-L/14 | WIT (400M+ pairs) |
| **DINOv2 ViT-L/14** | Spherical | ViT-L/14 | LVD-142M (self-supervised) |
| **CLIP-B (GRIT)** | Spherical | ViT-B/16 | GRIT (20.5M pairs) |
| **MERU-B (GRIT)** | Hyperbolic | ViT-B/16 | GRIT (20.5M pairs) |
| **HyCoCLIP-B (GRIT)** | Hyperbolic | ViT-B/16 | GRIT (20.5M pairs) |
## Datasets
### CZSL Benchmarks
- **MIT-States** — 53K images, 115 attributes, 245 objects
- **UT-Zappos** — 33K images, 16 attributes, 12 objects
- **C-GQA** — 39K images, 413 attributes, 674 objects
- **VAW-CZSL** — 92K images, 413 attributes, 541 objects
### Group Robustness
- **WaterBirds** — Bird type classification (spurious: background)
- **CelebA** — Hair color classification (spurious: gender)
## File Structure
Each dataset directory contains:
```
{dataset}/
├── IMGemb_{model}_{pretraining}.pt # Image embeddings
├── TEXTemb_{model}_{pretraining}.pt # Text pair embeddings
├── TEXTemb_primitives_{model}_{pretraining}.pt # Primitive text embeddings
├── metadata_compositional-split-natural.t7 # Dataset metadata
└── compositional-split-natural/
├── train_pairs.txt
├── val_pairs.txt
└── test_pairs.txt
```
### File Naming Convention
- `IMGemb_` — Image embeddings (one vector per image)
- `TEXTemb_` — Text embeddings for (attribute, object) pair prompts
- `TEXTemb_primitives_` — Separate attribute and object text embeddings
- Model identifiers: `ViT-L-14_openai`, `CLIP-B_GRIT_GRIT`, `MERU-B_GRIT_GRIT`, `HyCoCLIP-B_HyCoCLIP`, `dinov2_vitl14_talk2dino`, `MERU-L_MERU`
## Usage
```python
import torch
# Load image embeddings
img_emb = torch.load("mit-states/IMGemb_ViT-L-14_openai.pt", weights_only=False)
# Load text pair embeddings
text_emb = torch.load("mit-states/TEXTemb_ViT-L-14_openai.pt", weights_only=False)
# Load primitive text embeddings
primitives = torch.load("mit-states/TEXTemb_primitives_ViT-L-14_openai.pt", weights_only=False)
attr_embs = primitives['attr_embs'] # Individual attribute embeddings
obj_embs = primitives['obj_embs'] # Individual object embeddings
```
## Download
```bash
# Clone with git LFS
git lfs install
git clone https://huggingface.co/datasets/Meldashti/vlm-compositionality-embeddings
# Or using huggingface_hub
from huggingface_hub import snapshot_download
snapshot_download("Meldashti/vlm-compositionality-embeddings", local_dir="data/", repo_type="dataset")
```
## Citation
```bibtex
@mastersthesis{dashti2026euclidean,
title={From Euclidean to Hyperbolic Vision-Language Spaces: A Study of Attribute-Object Compositionality},
author={Dashti, Meelad},
school={Politecnico di Torino \& University of Twente},
year={2026}
}
```
## License
MIT
提供机构:
Meldashti



