Image-Guided Object Detection using OWL-ViTand Enhanced Query Embedding Extraction
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://doi.org/10.7910/DVN/PRHQMK
下载链接
链接失效反馈官方服务:
资源简介:
Computer vision has been receiving increasing attention with the recent complex Generative AI models released by tech industry giants, such as OpenAI and Google. However, there is a specific subfield that we wanted to focus on, that is, Image-Guided Object Detection. A detailed literature survey directed us towards a successful study called Simple Open-Vocabulary Object Detection with Vision Transformers (OWL-ViT) [1], which is a multifunctional complex model that can also perform image-guided object detection as a side function. In this study, some experiments have been conducted utilizing OWL-ViT architecture as the base model and manipulated the necessary parts to achieve a better one-shot performance. Code and models are available on GitHub.
创建时间:
2024-04-14



