Image-Guided Object Detection using OWL-ViTand Enhanced Query Embedding Extraction
收藏DataONE2024-04-14 更新2024-10-19 收录
下载链接:
https://search.dataone.org/view/sha256:ba5ef305af6a42ecb45fbcc686af8d69461b6734b87962e14ee243b1b09fcbc2
下载链接
链接失效反馈官方服务:
资源简介:
Computer vision has been receiving increasing attention with the recent complex Generative AI models released by tech industry giants, such as OpenAI and Google. However, there is a specific subfield that we wanted to focus on, that is, Image-Guided Object Detection. A detailed literature survey directed us towards a successful study called Simple Open-Vocabulary Object Detection with Vision Transformers (OWL-ViT) [1], which is a multifunctional complex model that can also perform image-guided object detection as a side function. In this study, some experiments have been conducted utilizing OWL-ViT architecture as the base model and manipulated the necessary parts to achieve a better one-shot performance. Code and models are available on GitHub.
创建时间:
2024-09-24



