alessiasaporita/MissRAG
收藏Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/alessiasaporita/MissRAG
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
tags:
- multimodal
- embeddings
- retrieval
- rag
- audio
- video
- text
- multimodal-llm
---
# MissRAG Embeddings & Modality Tokens
This repository provides **precomputed multimodal embeddings and modality tokens** used in the MissRAG framework:
> **MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models**
---
## 📌 Overview
We release:
- 🔹 **ImageBind-based embeddings** for multimodal retrieval
- 🔹 **Precomputed modality tokens** (audio/video) for efficient inference
These representations are designed to:
- enable **retrieval across modalities**
- support **missing modality scenarios**
- accelerate inference in multimodal LLMs
---
## 📁 Repository Structure
The repository is organized as follows:
```bash
MissRAG
├── IB_embeddings/ # ImageBind embeddings
├── modality_tokens/
│ ├── chatbridge/ # ChatBridge modality tokens
│ └── onellm/ # OneLLM modality tokens
```
### 1. Multimodal Embeddings
We provide precomputed multimodal embeddings obtained using [ImageBind](https://github.com/facebookresearch/ImageBind) which:
- align **audio**, **video**, and **text** modalities into a shared representation space
- enable **cross-modal similarity computation**
- support efficient **retrieval via inner product similarity**
This unified space allows querying with any available modality to retrieve semantically related samples from missing modalities.
### 2. Modality Tokens
We release precomputed modality-specific tokens for audio and video modalities.
These tokens are directly compatible with:
- [ChatBridge](https://github.com/CASIA-IVA-Lab/ChatBridge)
- [OneLLM](https://github.com/csuhan/onellm)
Precomputing modality tokens provides the following advantages:
- **Computational efficiency**: eliminates redundant forward passes over training data
- **Faster inference**: enables real-time retrieval-augmented generation
- **Scalability**: supports large-scale retrieval without recomputing representations
## ⚙️ Usage
### 🔹 Retrieval
We perform retrieval using **FAISS** for efficient nearest neighbor search in the embedding space.
Given a query embedding, we retrieve the top-k most similar prototypes using inner product similarity:
```python
D, I = index.search(query, k)
提供机构:
alessiasaporita



