Image-Guided Object Detection using OWL-ViTand Enhanced Query Embedding Extraction

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://doi.org/10.7910/DVN/PRHQMK

下载链接

链接失效反馈

官方服务：

资源简介：

Computer vision has been receiving increasing attention with the recent complex Generative AI models released by tech industry giants, such as OpenAI and Google. However, there is a specific subfield that we wanted to focus on, that is, Image-Guided Object Detection. A detailed literature survey directed us towards a successful study called Simple Open-Vocabulary Object Detection with Vision Transformers (OWL-ViT) [1], which is a multifunctional complex model that can also perform image-guided object detection as a side function. In this study, some experiments have been conducted utilizing OWL-ViT architecture as the base model and manipulated the necessary parts to achieve a better one-shot performance. Code and models are available on GitHub.

创建时间：

2024-04-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集