EA-KD: Entropy-based Adaptive Knowledge Distillation

Abstract

Knowledge distillation (KD) enables a smaller "student" model to mimic alarger "teacher" model by transferring knowledge from the teacher's output orfeatures. However, most KD methods treat all samples uniformly, overlooking thevarying learning value of each sample and thereby limiting their effectiveness.In this paper, we propose Entropy-based Adaptive Knowledge Distillation(EA-KD), a simple yet effective plug-and-play KD method that prioritizeslearning from valuable samples. EA-KD quantifies each sample's learning valueby strategically combining the entropy of the teacher and student output, thendynamically reweights the distillation loss to place greater emphasis onhigh-entropy samples. Extensive experiments across diverse KD frameworks andtasks -- including image classification, object detection, and large languagemodel (LLM) distillation -- demonstrate that EA-KD consistently enhancesperformance, achieving state-of-the-art results with negligible computationalcost. Code is available at: https://github.com/cpsu00/EA-KD

Quick Read (beta)

loading the full paper ...