Meldashti/vlm-compositionality-embeddings

Name: Meldashti/vlm-compositionality-embeddings
Creator: Meldashti
Published: 2026-04-07 19:29:29
License: 暂无描述

Hugging Face2026-04-07 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/Meldashti/vlm-compositionality-embeddings

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - zero-shot-classification - image-classification tags: - compositionality - vision-language-models - hyperbolic-geometry - GDE - CZSL - group-robustness - embeddings pretty_name: VLM Compositionality Embeddings size_categories: - 100K<n<1M --- # VLM Compositionality Embeddings Pre-computed image and text embeddings for the thesis **"From Euclidean to Hyperbolic Vision-Language Spaces: A Study of Attribute–Object Compositionality"** by Meelad Dashti (Politecnico di Torino & University of Twente, 2026). Code repository: [github.com/MelDashti/hyperbolic-vlm-compositionality](https://github.com/MelDashti/hyperbolic-vlm-compositionality) ## Models | Model | Geometry | Architecture | Training Data | |-------|----------|-------------|---------------| | **CLIP ViT-L/14** | Spherical | ViT-L/14 | WIT (400M+ pairs) | | **DINOv2 ViT-L/14** | Spherical | ViT-L/14 | LVD-142M (self-supervised) | | **CLIP-B (GRIT)** | Spherical | ViT-B/16 | GRIT (20.5M pairs) | | **MERU-B (GRIT)** | Hyperbolic | ViT-B/16 | GRIT (20.5M pairs) | | **HyCoCLIP-B (GRIT)** | Hyperbolic | ViT-B/16 | GRIT (20.5M pairs) | ## Datasets ### CZSL Benchmarks - **MIT-States** — 53K images, 115 attributes, 245 objects - **UT-Zappos** — 33K images, 16 attributes, 12 objects - **C-GQA** — 39K images, 413 attributes, 674 objects - **VAW-CZSL** — 92K images, 413 attributes, 541 objects ### Group Robustness - **WaterBirds** — Bird type classification (spurious: background) - **CelebA** — Hair color classification (spurious: gender) ## File Structure Each dataset directory contains: ``` {dataset}/ ├── IMGemb_{model}_{pretraining}.pt # Image embeddings ├── TEXTemb_{model}_{pretraining}.pt # Text pair embeddings ├── TEXTemb_primitives_{model}_{pretraining}.pt # Primitive text embeddings ├── metadata_compositional-split-natural.t7 # Dataset metadata └── compositional-split-natural/ ├── train_pairs.txt ├── val_pairs.txt └── test_pairs.txt ``` ### File Naming Convention - `IMGemb_` — Image embeddings (one vector per image) - `TEXTemb_` — Text embeddings for (attribute, object) pair prompts - `TEXTemb_primitives_` — Separate attribute and object text embeddings - Model identifiers: `ViT-L-14_openai`, `CLIP-B_GRIT_GRIT`, `MERU-B_GRIT_GRIT`, `HyCoCLIP-B_HyCoCLIP`, `dinov2_vitl14_talk2dino`, `MERU-L_MERU` ## Usage ```python import torch # Load image embeddings img_emb = torch.load("mit-states/IMGemb_ViT-L-14_openai.pt", weights_only=False) # Load text pair embeddings text_emb = torch.load("mit-states/TEXTemb_ViT-L-14_openai.pt", weights_only=False) # Load primitive text embeddings primitives = torch.load("mit-states/TEXTemb_primitives_ViT-L-14_openai.pt", weights_only=False) attr_embs = primitives['attr_embs'] # Individual attribute embeddings obj_embs = primitives['obj_embs'] # Individual object embeddings ``` ## Download ```bash # Clone with git LFS git lfs install git clone https://huggingface.co/datasets/Meldashti/vlm-compositionality-embeddings # Or using huggingface_hub from huggingface_hub import snapshot_download snapshot_download("Meldashti/vlm-compositionality-embeddings", local_dir="data/", repo_type="dataset") ``` ## Citation ```bibtex @mastersthesis{dashti2026euclidean, title={From Euclidean to Hyperbolic Vision-Language Spaces: A Study of Attribute-Object Compositionality}, author={Dashti, Meelad}, school={Politecnico di Torino \& University of Twente}, year={2026} } ``` ## License MIT

提供机构：

Meldashti

5,000+

优质数据集

54 个

任务类型

进入经典数据集