CLIP Features and Selected Relevance Judgments Subset for TRECVID Ad-hoc Search (2019-2023)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13941107
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains CLIP features and annotations for a subset of V3C images, based on their relevance to selected queries from the TREC Video Retrieval Evaluation (TRECVID) Ad-hoc Video Search (AVS) task. The data includes annotations for AVS queries and judgments conducted in TRECVID from 2019 to 2023 [1], using the V3C1 and V3C2 collections [2]. Specifically, the TRECVID-AVS collection covers 89 queries, with video shots manually labeled as relevant (1), non-relevant (0), or not annotated (-1).
We used approximately 2.6 million keyframes extracted from these video shots, mapping the annotations to the corresponding keyframes (note that there may not be a one-to-one correspondence between TRECVID shotID since multiple frames might be extracted from a single shot). Image representations are based on CLIP ViT-H/14 - LAION-2B features [3]. The timestamps of the keyframes and their CLIP features are sourced from the VISIONE repository [4].
Given the incomplete nature of the TRECVID ground truth (where only a subset of video segments were judged per query), we focused on queries with at least 200 positive and 1400 negative annotations. This resulted in 80 datasets, each containing 1500 images—10% labeled as relevant and 90% as non-relevant.
Contents of the Repository:
Query-specific CSV Files: For each of the 80 selected AVS query (e.g., 1591), the corresponding CSV file (e.g., 1591.csv) contains a column for each image, where:
VISIONE image ID is in the first row.
CLIP features are in the subsequent rows.
Relevance annotations are in the last row: 1 for relevant, 0 for non-relevant.
Post-processed Datasets:
dataset_normalized.zip: L2-normalized CLIP features.
dataset_softmax.zip: CLIP features converted into probabilities using a softmax function.
dataset_logistic.zip: CLIP features converted into probabilities using a logistic function followed by L1 normalization.
Text Feature Data: clip_laion_text_features.csv contains additional details for each query, including the query ID, query text, and L2 normalized CLIP features extracted from the query text.
Citation and Usage:
This data was used in the experiments described in:
Lucia Vadicamo, Francesca Scotti, Alan Dearle, Richard Connor, Comparative Analysis of Relevance Feedback Techniques for Image Retrieval, in Proceedings of the 31st International Conference on Multimedia Modeling (MMM 2025).
The data is released under a Creative Commons Attribution license. If you use it in your research, please cite the above work.
References:
[1]TRECVID Data: https://www-nlpir.nist.gov/projects/trecvid/trecvid.data.html[2] Rossetto, L., Schuldt, H., Awad, G., Butt, A.A.: V3C - A research video collection. In: International Conference on Multimedia Modeling, pp. 349–360. Springer (2019).[3] https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K[4] VISIONE Repository: https://zenodo.org/records/8188570
创建时间:
2024-10-17



