A Fast Knowledge Distillation Framework for Visual Recognition

Abstract

While Knowledge Distillation (KD) has been recognized as a useful tool inmany visual tasks, such as supervised classification and self-supervisedrepresentation learning, the main drawback of a vanilla KD framework is itsmechanism, which consumes the majority of the computational overhead onforwarding through the giant teacher networks, making the entire learningprocedure inefficient and costly. ReLabel, a recently proposed solution,suggests creating a label map for the entire image. During training, itreceives the cropped region-level label by RoI aligning on a pre-generatedentire label map, allowing for efficient supervision generation without havingto pass through the teachers many times. However, as the KD teachers are fromconventional multi-crop training, there are various mismatches between theglobal label-map and region-level label in this technique, resulting inperformance deterioration. In this study, we present a Fast KnowledgeDistillation (FKD) framework that replicates the distillation training phaseand generates soft labels using the multi-crop KD approach, while trainingfaster than ReLabel since no post-processes such as RoI align and softmaxoperations are used. When conducting multi-crop in the same image for dataloading, our FKD is even more efficient than the traditional imageclassification framework. On ImageNet-1K, we obtain 79.8% with ResNet-50,outperforming ReLabel by ~1.0% while being faster. On the self-supervisedlearning task, we also show that FKD has an efficiency advantage. Our projectpage: http://zhiqiangshen.com/projects/FKD/index.html, source code and modelsare available at: https://github.com/szq0214/FKD.

Quick Read (beta)

loading the full paper ...