TREC-AToMiC/AToMiC-Baselines

Name: TREC-AToMiC/AToMiC-Baselines
Creator: TREC-AToMiC
Published: 2023-10-22 22:10:13
License: 暂无描述

Hugging Face2023-10-22 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/TREC-AToMiC/AToMiC-Baselines

下载链接

链接失效反馈

官方服务：

资源简介：

# AToMiC Prebuilt Indexes ## Example Usage: ### Reproduction Toolkits: https://github.com/TREC-AToMiC/AToMiC/tree/main/examples/dense_retriever_baselines ```bash # Skip the encode and index steps, search with the prebuilt indexes and topics directly python search.py \ --topics topics/openai.clip-vit-base-patch32.text.validation \ --index indexes/openai.clip-vit-base-patch32.image.faiss.flat \ --hits 1000 \ --output runs/run.openai.clip-vit-base-patch32.validation.t2i.large.trec python search.py \ --topics topics/openai.clip-vit-base-patch32.image.validation \ --index indexes/openai.clip-vit-base-patch32.text.faiss.flat \ --hits 1000 \ --output runs/run.openai.clip-vit-base-patch32.validation.i2t.large.trec ``` ### Explore AToMiC datasets ```python import torch from pathlib import Path from datasets import load_dataset from transformers import AutoModel, AutoProcessor INDEX_DIR='indexes' INDEX_NAME='openai.clip-vit-base-patch32.image.faiss.flat' QUERY = 'Elizabeth II' images = load_dataset('TREC-AToMiC/AToMiC-Images-v0.2', split='train') images.load_faiss_index(index_name=INDEX_NAME, file=Path(INDEX_DIR, INDEX_NAME, 'index')) model = AutoModel.from_pretrained('openai/clip-vit-base-patch32') processor = AutoProcessor.from_pretrained('openai/clip-vit-base-patch32') # prebuilt indexes contain L2-normalized vectors with torch.no_grad(): q_embedding = model.get_text_features(**processor(text=query, return_tensors="pt")) q_embedding = torch.nn.functional.normalize(q_embedding, dim=-1).detach().numpy() scores, retrieved = images.get_nearest_examples(index_name, q_embedding, k=10) ```

提供机构：

TREC-AToMiC

原始信息汇总

AToMiC 预构建索引

示例用法

复现

bash

跳过编码和索引步骤，直接使用预构建索引和主题进行搜索

python search.py --topics topics/openai.clip-vit-base-patch32.text.validation --index indexes/openai.clip-vit-base-patch32.image.faiss.flat --hits 1000 --output runs/run.openai.clip-vit-base-patch32.validation.t2i.large.trec

python search.py --topics topics/openai.clip-vit-base-patch32.image.validation --index indexes/openai.clip-vit-base-patch32.text.faiss.flat --hits 1000 --output runs/run.openai.clip-vit-base-patch32.validation.i2t.large.trec

探索 AToMiC 数据集

python import torch from pathlib import Path from datasets import load_dataset from transformers import AutoModel, AutoProcessor

INDEX_DIR=indexes INDEX_NAME=openai.clip-vit-base-patch32.image.faiss.flat QUERY = Elizabeth II

images = load_dataset(TREC-AToMiC/AToMiC-Images-v0.2, split=train) images.load_faiss_index(index_name=INDEX_NAME, file=Path(INDEX_DIR, INDEX_NAME, index))

model = AutoModel.from_pretrained(openai/clip-vit-base-patch32) processor = AutoProcessor.from_pretrained(openai/clip-vit-base-patch32)

预构建索引包含 L2 归一化向量

with torch.no_grad(): q_embedding = model.get_text_features(**processor(text=query, return_tensors="pt")) q_embedding = torch.nn.functional.normalize(q_embedding, dim=-1).detach().numpy()

scores, retrieved = images.get_nearest_examples(index_name, q_embedding, k=10)

5,000+

优质数据集

54 个

任务类型

进入经典数据集