Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

Abstract

Stochastic neural net weights are used in a variety of contexts, includingregularization, Bayesian neural nets, exploration in reinforcement learning,and evolution strategies. Unfortunately, due to the large number of weights,all the examples in a mini-batch typically share the same weight perturbation,thereby limiting the variance reduction effect of large mini-batches. Weintroduce flipout, an efficient method for decorrelating the gradients within amini-batch by implicitly sampling pseudo-independent weight perturbations foreach example. Empirically, flipout achieves the ideal linear variance reductionfor fully connected networks, convolutional networks, and RNNs. We findsignificant speedups in training neural networks with multiplicative Gaussianperturbations. We show that flipout is effective at regularizing LSTMs, andoutperforms previous methods. Flipout also enables us to vectorize evolutionstrategies: in our experiments, a single GPU with flipout can handle the samethroughput as at least 40 CPU cores using existing methods, equivalent to afactor-of-4 cost reduction on Amazon Web Services.

Quick Read (beta)

loading the full paper ...