Data for : Making Large Vision Language Models Better Few-Shot Learners
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/data-making-large-vision-language-models-better-few-shot-learners-0
下载链接
链接失效反馈官方服务:
资源简介:
Few-shot classification (FSC) aims to emulate the human ability to rapidly learn new concepts from a handful of examples. Large Vision-Language Models (LVLMs), with their rich prior knowledge and powerful visio-linguistic understanding capabilities, are emerging as a highly promising paradigm for FSC. This paper investigates the challenges of applying LVLMs to FSC tasks and identifies two core learning bottlenecks. The first is the model's inherent positional bias, such as favoring last options in textual choices.The second is the insufficient learning problem, where the model tends to rely on its vast pre-trained knowledge, rather than genuinely generalizing new knowledge from the support samples provided in the current task. Furthermore, the support-query paradigm in FSC presents a significant efficiency challenge, as the long sequence structure resulting from multiple image-text inputs leads to high inference costs. To address these challenges, we first correct the model's positional bias by constructing positionally balanced meta-tasks for instruction fine-tuning. To enhance the model's generalized learning, we introduce a semantic-guided background generation strategy to break spurious visual correlations between foreground and background. Furthermore, we propose a hard negative mining strategy to organize more efficient instruction fine-tuning, compelling the model to focus on more discriminative feature information.
提供机构:
Yi Xu



