TREC-AToMiC/AToMiC-Baselines
收藏AToMiC 预构建索引
示例用法
复现
bash
跳过编码和索引步骤,直接使用预构建索引和主题进行搜索
python search.py --topics topics/openai.clip-vit-base-patch32.text.validation --index indexes/openai.clip-vit-base-patch32.image.faiss.flat --hits 1000 --output runs/run.openai.clip-vit-base-patch32.validation.t2i.large.trec
python search.py --topics topics/openai.clip-vit-base-patch32.image.validation --index indexes/openai.clip-vit-base-patch32.text.faiss.flat --hits 1000 --output runs/run.openai.clip-vit-base-patch32.validation.i2t.large.trec
探索 AToMiC 数据集
python import torch from pathlib import Path from datasets import load_dataset from transformers import AutoModel, AutoProcessor
INDEX_DIR=indexes INDEX_NAME=openai.clip-vit-base-patch32.image.faiss.flat QUERY = Elizabeth II
images = load_dataset(TREC-AToMiC/AToMiC-Images-v0.2, split=train) images.load_faiss_index(index_name=INDEX_NAME, file=Path(INDEX_DIR, INDEX_NAME, index))
model = AutoModel.from_pretrained(openai/clip-vit-base-patch32) processor = AutoProcessor.from_pretrained(openai/clip-vit-base-patch32)
预构建索引包含 L2 归一化向量
with torch.no_grad(): q_embedding = model.get_text_features(**processor(text=query, return_tensors="pt")) q_embedding = torch.nn.functional.normalize(q_embedding, dim=-1).detach().numpy()
scores, retrieved = images.get_nearest_examples(index_name, q_embedding, k=10)



