High-Performance Large-Scale Image Recognition Without Normalization

Abstract

Batch normalization is a key component of most image classification models,but it has many undesirable properties stemming from its dependence on thebatch size and interactions between examples. Although recent work hassucceeded in training deep ResNets without normalization layers, these modelsdo not match the test accuracies of the best batch-normalized networks, and areoften unstable for large learning rates or strong data augmentations. In thiswork, we develop an adaptive gradient clipping technique which overcomes theseinstabilities, and design a significantly improved class of Normalizer-FreeResNets. Our smaller models match the test accuracy of an EfficientNet-B7 onImageNet while being up to 8.7x faster to train, and our largest models attaina new state-of-the-art top-1 accuracy of 86.5%. In addition, Normalizer-Freemodels attain significantly better performance than their batch-normalizedcounterparts when finetuning on ImageNet after large-scale pre-training on adataset of 300 million labeled images, with our best models obtaining anaccuracy of 89.2%. Our code is available at https://github.com/deepmind/deepmind-research/tree/master/nfnets

Quick Read (beta)

loading the full paper ...