Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks

Abstract

Batch Normalization (BN) is a highly successful and widely used batchdependent training method. Its use of mini-batch statistics to normalize theactivations introduces dependence between samples, which can hurt the trainingif the mini-batch size is too small, or if the samples are correlated. Severalalternatives, such as Batch Renormalization and Group Normalization (GN), havebeen proposed to address these issues. However, they either do not match theperformance of BN for large batches, or still exhibit degradation inperformance for smaller batches, or introduce artificial constraints on themodel architecture. In this paper we propose the Filter Response Normalization(FRN) layer, a novel combination of a normalization and an activation function,that can be used as a drop-in replacement for other normalizations andactivations. Our method operates on each activation map of each batch sampleindependently, eliminating the dependency on other batch samples or channels ofthe same sample. Our method outperforms BN and all alternatives in a variety ofsettings for all batch sizes. FRN layer performs $\approx 0.7-1.0\%$ better ontop-1 validation accuracy than BN with large mini-batch sizes on Imagenetclassification on InceptionV3 and ResnetV2-50 architectures. Further, itperforms $>1\%$ better than GN on the same problem in the small mini-batch sizeregime. For object detection problem on COCO dataset, FRN layer outperforms allother methods by at least $0.3-0.5\%$ in all batch size regimes.

Quick Read (beta)

loading the full paper ...