We introduce backdrop, a flexible and simple-to-implement method, intuitivelydescribed as dropout acting only along the backpropagation pipeline. Backdropis implemented via one or more masking layers which are inserted at specificpoints along the network. Each backdrop masking layer acts as the identity inthe forward pass, but randomly masks parts of the backward gradientpropagation. Intuitively, inserting a backdrop layer after any convolutionallayer leads to stochastic gradients corresponding to features of that scale.Therefore, backdrop is well suited for problems in which the data have amulti-scale, hierarchical structure. Backdrop can also be applied to problemswith non-decomposable loss functions where standard SGD methods are not wellsuited. We perform a number of experiments and demonstrate that backdrop leadsto significant improvements in generalization.