five

Few-shot Image Classification Method Based on Salient Position Interaction Transformer

收藏
中国科学数据2026-02-09 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.19678/j.issn.1000-3428.0070134
下载链接
链接失效反馈
官方服务:
资源简介:
Image classification, a fundamental task in computer vision, has achieved remarkable results on large-scale datasets. However, traditional deep learning methods tend to overfit under low sample size conditions, thereby affecting the model's generalization ability. To address this issue, this study presents a novel small sample image classification method to improve classification performance when sample data are scarce. This method is based on the significant position interaction Transformer and the target classifier, specifically leveraging the structure and advantages of the Vision Transformer (ViT) model. The Interaction Multi-Head Self Attention (HI-MHSA) module with significant position selection is introduced, increasing the interaction between each attention head in the multi-head self-attention module, strengthening the model's attention to significant regions in the input image, saving computational resources, and further improving the learning efficiency and accuracy of the model through the supervision and guidance of the target classifier. Experimental results show that on the miniImageNet, tieredImageNet, and CUB datasets, the proposed method achieves classification accuracies of approximately 67.09%, 72.07%, and 79.82% in a 5-way 1-shot task and approximately 83.54%, 85.62%, and 90.35%, in a 5-way 5-shot task, respectively. Therefore, the proposed method can perform well and is highly practical for small sample image classification tasks.
创建时间:
2026-02-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作