What Makes a Good Dataset for Knowledge Distillation?

  • 2025-10-02 17:54:35
  • Logan Frank, Jim Davis
  • 0

Abstract

Knowledge distillation (KD) has been a popular and effective method for modelcompression. One important assumption of KD is that the teacher's originaldataset will also be available when training the student. However, insituations such as continual learning and distilling large models trained oncompany-withheld datasets, having access to the original data may not always bepossible. This leads practitioners towards utilizing other sources ofsupplemental data, which could yield mixed results. One must then ask: "whatmakes a good dataset for transferring knowledge from teacher to student?" Manywould assume that only real in-domain imagery is viable, but is that the onlyoption? In this work, we explore multiple possible surrogate distillationdatasets and demonstrate that many different datasets, even unnatural syntheticimagery, can serve as a suitable alternative in KD. From examining thesealternative datasets, we identify and present various criteria describing whatmakes a good dataset for distillation. Source code is available athttps://github.com/osu-cvl/good-kd-dataset.

 

Quick Read (beta)

loading the full paper ...