Abstract
Dropout is a regularization technique widely used in training artificialneural networks to mitigate overfitting. It consists of dynamicallydeactivating subsets of the network during training to promote more robustrepresentations. Despite its widespread adoption, dropout probabilities areoften selected heuristically, and theoretical explanations of its successremain sparse. Here, we analytically study dropout in two-layer neural networkstrained with online stochastic gradient descent. In the high-dimensional limit,we derive a set of ordinary differential equations that fully characterize theevolution of the network during training and capture the effects of dropout. Weobtain a number of exact results describing the generalization error and theoptimal dropout probability at short, intermediate, and long training times.Our analysis shows that dropout reduces detrimental correlations between hiddennodes, mitigates the impact of label noise, and that the optimal dropoutprobability increases with the level of noise in the data. Our results arevalidated by extensive numerical simulations.