In this paper, we propose imitation networks, a simple but effective methodfor training neural networks with a limited amount of training data. Ourapproach inherits the idea of knowledge distillation that transfers knowledgefrom a deep or wide reference model to a shallow or narrow target model. Theproposed method employs this idea to mimic predictions of reference estimatorsthat are much more robust against overfitting than the network we want totrain. Different from almost all the previous work for knowledge distillationthat requires a large amount of labeled training data, the proposed methodrequires only a small amount of training data. Instead, we introduce pseudotraining examples that are optimized as a part of model parameters.Experimental results for several benchmark datasets demonstrate that theproposed method outperformed all the other baselines, such as naive training ofthe target model and standard knowledge distillation.