The Lottery Ticket Hypothesis: Training Pruned Neural Networks

Abstract

Recent work on neural network pruning indicates that, at training time,neural networks need to be significantly larger in size than is necessary torepresent the eventual functions that they learn. This paper articulates a newhypothesis to explain this phenomenon. This conjecture, which we term the"lottery ticket hypothesis," proposes that successful training depends on luckyrandom initialization of a smaller subcomponent of the network. Larger networkshave more of these "lottery tickets," meaning they are more likely to luck outwith a subcomponent initialized in a configuration amenable to successfuloptimization. This paper conducts a series of experiments with XOR and MNIST that supportthe lottery ticket hypothesis. In particular, we identify thesefortuitously-initialized subcomponents by pruning low-magnitude weights fromtrained networks. We then demonstrate that these subcomponents can besuccessfully retrained in isolation so long as the subnetworks are given thesame initializations as they had at the beginning of the training process.Initialized as such, these small networks reliably converge successfully, oftenfaster than the original network at the same level of accuracy. However, whenthese subcomponents are randomly reinitialized or rearranged, they performworse than the original network. In other words, large networks that trainsuccessfully contain small subnetworks with initializations conducive tooptimization. The lottery ticket hypothesis and its connection to pruning are a step towarddeveloping architectures, initializations, and training strategies that make itpossible to solve the same problems with much smaller networks.

Quick Read (beta)

loading the full paper ...