Knowledge Distillation Method Comparison
收藏DataCite Commons2025-03-26 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/knowledge-distillation-method-comparison
下载链接
链接失效反馈官方服务:
资源简介:
Artificial Intelligence (AI) has increasingly influenced modern society, recently in particular through significant advancements in Large Language Models (LLMs). However, high computational and storage demands of LLMs still limit their deployment in resource-constrained environments. Knowledge distillation addresses this challenge by training a smaller language model (student) from a larger one (teacher). Previous research has introduced several distillation methods for both generating training data and training the student model. Despite their relevance, the effects of state-of-the-art distillation methods on model performance and explainability have not been thoroughly investigated and compared. In our work, we enlarge the set of available methods by applying Critique-Revision Prompting to distillation and by synthesizing existing methods. For these methods, we provide a systematic comparison based on a popular commonsense question-answering dataset. While we measure performance via student model accuracy, we employ a humangrounded study to evaluate explainability. We contribute new distillation methods and their comparison in terms of both performance and explainability.
人工智能(Artificial Intelligence,AI)对现代社会的影响日益深远,近年来尤其通过大语言模型(Large Language Model,LLM)的重大进展得以凸显。然而,LLM的高计算与存储需求仍限制其在资源受限环境(resource-constrained environment)中的部署。知识蒸馏(knowledge distillation)通过从较大的语言模型(教师模型)训练较小的语言模型(学生模型),为这一挑战提供了解决方案。先前的研究已提出多种蒸馏方法,涵盖训练数据生成与学生模型训练两个方面。尽管这些方法具有重要意义,但最先进的蒸馏方法对模型性能与可解释性(explainability)的影响尚未得到全面的研究与比较。在本研究中,我们通过将批判-修订提示(Critique-Revision Prompting)应用于蒸馏任务,并综合现有方法,扩充了可用方法的范畴。针对这些方法,我们基于一个广泛使用的常识问答数据集(commonsense question-answering dataset)开展了系统性比较分析。我们以学生模型的准确率衡量性能表现,同时采用基于人类的研究(human-grounded study)评估其可解释性。本研究的贡献包括新的蒸馏方法,以及这些方法在性能与可解释性两维度上的比较分析。
提供机构:
IEEE DataPort
创建时间:
2025-03-26



