AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

Abstract

Neural networks are often over-parameterized and hence benefit fromaggressive regularization. Conventional regularization methods, such as Dropoutor weight decay, do not leverage the structures of the network's inputs andhidden states. As a result, these conventional methods are less effective thanmethods that leverage the structures, such as SpatialDropout and DropBlock,which randomly drop the values at certain contiguous areas in the hidden statesand setting them to zero. Although the locations of dropout areas random, thepatterns of SpatialDropout and DropBlock are manually designed and fixed. Herewe propose to learn the dropout patterns. In our method, a controller learns togenerate a dropout pattern at every channel and layer of a target network, suchas a ConvNet or a Transformer. The target network is then trained with thedropout pattern, and its resulting validation performance is used as a signalfor the controller to learn from. We show that this method works well for bothimage recognition on CIFAR-10 and ImageNet, as well as language modeling onPenn Treebank and WikiText-2. The learned dropout patterns also transfers todifferent tasks and datasets, such as from language model on Penn Treebank toEngligh-French translation on WMT 2014. Our code will be available.

Quick Read (beta)

loading the full paper ...