Few-shot Image Classification Method Based on Salient Position Interaction Transformer

中国科学数据2026-02-09 更新2026-04-25 收录

下载链接：

https://www.sciengine.com/AA/doi/10.19678/j.issn.1000-3428.0070134

下载链接

链接失效反馈

官方服务：

资源简介：

Image classification, a fundamental task in computer vision, has achieved remarkable results on large-scale datasets. However, traditional deep learning methods tend to overfit under low sample size conditions, thereby affecting the model's generalization ability. To address this issue, this study presents a novel small sample image classification method to improve classification performance when sample data are scarce. This method is based on the significant position interaction Transformer and the target classifier, specifically leveraging the structure and advantages of the Vision Transformer (ViT) model. The Interaction Multi-Head Self Attention (HI-MHSA) module with significant position selection is introduced, increasing the interaction between each attention head in the multi-head self-attention module, strengthening the model's attention to significant regions in the input image, saving computational resources, and further improving the learning efficiency and accuracy of the model through the supervision and guidance of the target classifier. Experimental results show that on the miniImageNet, tieredImageNet, and CUB datasets, the proposed method achieves classification accuracies of approximately 67.09%, 72.07%, and 79.82% in a 5-way 1-shot task and approximately 83.54%, 85.62%, and 90.35%, in a 5-way 5-shot task, respectively. Therefore, the proposed method can perform well and is highly practical for small sample image classification tasks.

创建时间：

2026-02-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集