five

TREC-AToMiC/AToMiC-Baselines

收藏
Hugging Face2023-10-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/TREC-AToMiC/AToMiC-Baselines
下载链接
链接失效反馈
官方服务:
资源简介:
# AToMiC Prebuilt Indexes ## Example Usage: ### Reproduction Toolkits: https://github.com/TREC-AToMiC/AToMiC/tree/main/examples/dense_retriever_baselines ```bash # Skip the encode and index steps, search with the prebuilt indexes and topics directly python search.py \ --topics topics/openai.clip-vit-base-patch32.text.validation \ --index indexes/openai.clip-vit-base-patch32.image.faiss.flat \ --hits 1000 \ --output runs/run.openai.clip-vit-base-patch32.validation.t2i.large.trec python search.py \ --topics topics/openai.clip-vit-base-patch32.image.validation \ --index indexes/openai.clip-vit-base-patch32.text.faiss.flat \ --hits 1000 \ --output runs/run.openai.clip-vit-base-patch32.validation.i2t.large.trec ``` ### Explore AToMiC datasets ```python import torch from pathlib import Path from datasets import load_dataset from transformers import AutoModel, AutoProcessor INDEX_DIR='indexes' INDEX_NAME='openai.clip-vit-base-patch32.image.faiss.flat' QUERY = 'Elizabeth II' images = load_dataset('TREC-AToMiC/AToMiC-Images-v0.2', split='train') images.load_faiss_index(index_name=INDEX_NAME, file=Path(INDEX_DIR, INDEX_NAME, 'index')) model = AutoModel.from_pretrained('openai/clip-vit-base-patch32') processor = AutoProcessor.from_pretrained('openai/clip-vit-base-patch32') # prebuilt indexes contain L2-normalized vectors with torch.no_grad(): q_embedding = model.get_text_features(**processor(text=query, return_tensors="pt")) q_embedding = torch.nn.functional.normalize(q_embedding, dim=-1).detach().numpy() scores, retrieved = images.get_nearest_examples(index_name, q_embedding, k=10) ```
提供机构:
TREC-AToMiC
原始信息汇总

AToMiC 预构建索引

示例用法

复现

bash

跳过编码和索引步骤,直接使用预构建索引和主题进行搜索

python search.py --topics topics/openai.clip-vit-base-patch32.text.validation --index indexes/openai.clip-vit-base-patch32.image.faiss.flat --hits 1000 --output runs/run.openai.clip-vit-base-patch32.validation.t2i.large.trec

python search.py --topics topics/openai.clip-vit-base-patch32.image.validation --index indexes/openai.clip-vit-base-patch32.text.faiss.flat --hits 1000 --output runs/run.openai.clip-vit-base-patch32.validation.i2t.large.trec

探索 AToMiC 数据集

python import torch from pathlib import Path from datasets import load_dataset from transformers import AutoModel, AutoProcessor

INDEX_DIR=indexes INDEX_NAME=openai.clip-vit-base-patch32.image.faiss.flat QUERY = Elizabeth II

images = load_dataset(TREC-AToMiC/AToMiC-Images-v0.2, split=train) images.load_faiss_index(index_name=INDEX_NAME, file=Path(INDEX_DIR, INDEX_NAME, index))

model = AutoModel.from_pretrained(openai/clip-vit-base-patch32) processor = AutoProcessor.from_pretrained(openai/clip-vit-base-patch32)

预构建索引包含 L2 归一化向量

with torch.no_grad(): q_embedding = model.get_text_features(**processor(text=query, return_tensors="pt")) q_embedding = torch.nn.functional.normalize(q_embedding, dim=-1).detach().numpy()

scores, retrieved = images.get_nearest_examples(index_name, q_embedding, k=10)

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作