Abstract
Unsupervised pre-training and transfer learning are commonly used techniquesto initialize training algorithms for neural networks, particularly in settingswith limited labeled data. In this paper, we study the effects of unsupervisedpre-training and transfer learning on the sample complexity of high-dimensionalsupervised learning. Specifically, we consider the problem of training asingle-layer neural network via online stochastic gradient descent. Weestablish that pre-training and transfer learning (under concept shift) reducesample complexity by polynomial factors (in the dimension) under very generalassumptions. We also uncover some surprising settings where pre-training grantsexponential improvement over random initialization in terms of samplecomplexity.