Extrapolating from a Single Image to a Thousand Classes using Distillation

Abstract

What can neural networks learn about the visual world from a single image?While it obviously cannot contain the multitudes of possible objects, scenesand lighting conditions that exist - within the space of all possible256^(3x224x224) 224-sized square images, it might still provide a strong priorfor natural images. To analyze this hypothesis, we develop a framework fortraining neural networks from scratch using a single image by means ofknowledge distillation from a supervisedly pretrained teacher. With this, wefind that the answer to the above question is: 'surprisingly, a lot'. Inquantitative terms, we find top-1 accuracies of 94%/74% on CIFAR-10/100, 59% onImageNet and, by extending this method to audio, 84% on SpeechCommands. Inextensive analyses we disentangle the effect of augmentations, choice of sourceimage and network architectures and also discover "panda neurons" in networksthat have never seen a panda. This work shows that one image can be used toextrapolate to thousands of object classes and motivates a renewed researchagenda on the fundamental interplay of augmentations and image.

Quick Read (beta)

loading the full paper ...