Abstract
Existing knowledge distillation methods generally use a teacher-studentapproach, where the student network solely learns from a well-trained teacher.However, this approach overlooks the inherent differences in learning abilitiesbetween the teacher and student networks, thus causing the capacity-gapproblem. To address this limitation, we propose a novel method called SLKD.
Quick Read (beta)
loading the full paper ...