five

Data for : Making Large Vision Language Models Better Few-Shot Learners

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/data-making-large-vision-language-models-better-few-shot-learners-0
下载链接
链接失效反馈
官方服务:
资源简介:
Few-shot classification (FSC) aims to emulate the human ability to rapidly learn new concepts from a handful of examples. Large Vision-Language Models (LVLMs), with their rich prior knowledge and powerful visio-linguistic understanding capabilities, are emerging as a highly promising paradigm for FSC. This paper investigates the challenges of applying LVLMs to FSC tasks and identifies two core learning bottlenecks. The first is the model's inherent positional bias, such as favoring last options in textual choices.The second is the insufficient learning problem, where the model tends to rely on its vast pre-trained knowledge, rather than genuinely generalizing new knowledge from the support samples provided in the current task. Furthermore, the support-query paradigm in FSC presents a significant efficiency challenge, as the long sequence structure resulting from multiple image-text inputs leads to high inference costs. To address these challenges, we first correct the model's positional bias by constructing positionally balanced meta-tasks for instruction fine-tuning. To enhance the model's generalized learning, we introduce a semantic-guided background generation strategy to break spurious visual correlations between foreground and background. Furthermore, we propose a hard negative mining strategy to organize more efficient instruction fine-tuning, compelling the model to focus on more discriminative feature information.
提供机构:
Yi Xu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作