KDViT: COVID-19 diagnosis on CT-scans with knowledge distillation of vision transformer

Name: KDViT: COVID-19 diagnosis on CT-scans with knowledge distillation of vision transformer
Creator: Taylor & Francis
Published: 2024-05-16 07:10:20
License: 暂无描述

DataCite Commons2024-05-16 更新2024-08-19 收录

下载链接：

https://tandf.figshare.com/articles/dataset/KDViT_COVID-19_diagnosis_on_CT-scans_with_knowledge_distillation_of_vision_transformer/25826882

下载链接

链接失效反馈

官方服务：

资源简介：

This paper introduces Knowledge Distillation of Vision Transformer (KDViT), a novel approach for medical image classification. The Vision Transformer architecture incorporates a self-attention mechanism to autonomously learn image structure. The input medical image is segmented into patches and transformed into low-dimensional linear embeddings. Position information is integrated into each patch, and a learnable classification token is appended for classification, thereby preserving spatial relationships within the image. The output vectors are then fed into a Transformer encoder to extract both local and global features, leveraging the inherent attention mechanism for robust feature extraction across diverse medical imaging scenarios. Furthermore, knowledge distillation is employed to enhance performance by transferring insights from a large teacher model to a small student model. This approach reduces the computational requirements of the larger model and improves overall effectiveness. Integrating knowledge distillation with two Vision Transformer models not only showcases the novelty of the proposed solution for medical image classification but also enhances model interpretability, reduces computational complexity, and improves generalization capabilities. The proposed KDViT model achieved high accuracy rates of 98.39%, 88.57%, and 99.15% on the SARS-CoV-2-CT, COVID-CT, and iCTCF datasets respectively, surpassing the performance of other state-of-the-art methods.

本论文介绍了视觉Transformer知识蒸馏（Knowledge Distillation of Vision Transformer，KDViT），一种面向医学图像分类的创新方法。视觉Transformer架构集成了自注意力机制，可自主学习图像的结构特征。输入的医学图像将被分割为图像块（patches），并转换为低维线性嵌入（linear embeddings）。将位置信息融入每个图像块中，并追加一个可学习的分类Token（Token）以完成分类任务，由此保留图像内部的空间关联关系。随后将输出向量送入Transformer编码器，以提取局部与全局特征，依托其固有注意力机制在各类医学成像场景中实现鲁棒的特征提取。此外，本研究采用知识蒸馏技术，通过将大型教师模型的知识迁移至小型学生模型，以提升模型性能。该方法降低了大型模型的计算开销，并优化了整体效能。将知识蒸馏与两款视觉Transformer模型相结合，不仅彰显了所提医学图像分类方案的创新性，同时还提升了模型可解释性、降低了计算复杂度，并增强了泛化能力。所提出的KDViT模型在SARS-CoV-2-CT、COVID-CT以及iCTCF数据集上分别取得了98.39%、88.57%与99.15%的高精度结果，其性能超越了其他当前最优（state-of-the-art）方法。

提供机构：

Taylor & Francis

创建时间：

2024-05-15