Dataset Distillation

  • 2018-11-27 13:17:45
  • Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, Alexei A. Efros
Model distillation aims to distill the knowledge of a complex model into asimpler one. In this paper, we consider an alternative formulation called {\emdataset distillation}: we keep the model fixed and instead attempt to distillthe knowledge from a large training dataset into a small one. The idea is to{\em synthesize} a small number of data points that do not need to come fromthe correct data distribution, but will, when given to the learning algorithmas training data, approximate the model trained on the original data. Forexample, we show that it is possible to compress $60,000$ MNIST training imagesinto just $10$ synthetic {\em distilled images} (one per class) and achieveclose to original performance with only a few steps of gradient descent, givena particular fixed network initialization. We evaluate our method in a widerange of initialization settings and with different learning objectives.Experiments on multiple datasets show the advantage of our approach compared toalternative methods in most settings.


