Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches

Abstract

As an indispensable component, Batch Normalization (BN) has successfullyimproved the training of deep neural networks (DNNs) with mini-batches, bynormalizing the distribution of the internal representation for each hiddenlayer. However, the effectiveness of BN would diminish with scenario ofmicro-batch (e.g., less than 10 samples in a mini-batch), since the estimatedstatistics in a mini-batch are not reliable with insufficient samples. In thispaper, we present a novel normalization method, called Batch KalmanNormalization (BKN), for improving and accelerating the training of DNNs,particularly under the context of micro-batches. Specifically, unlike theexisting solutions treating each hidden layer as an isolated system, BKN treatsall the layers in a network as a whole system, and estimates the statistics ofa certain layer by considering the distributions of all its preceding layers,mimicking the merits of Kalman Filtering. BKN has two appealing properties.First, it enables more stable training and faster convergence compared toprevious works. Second, training DNNs using BKN performs substantially betterthan those using BN and its variants, especially when very small mini-batchesare presented. On the image classification benchmark of ImageNet, using BKNpowered networks we improve upon the best-published model-zoo results: reaching74.0% top-1 val accuracy for InceptionV2. More importantly, using BKN achievesthe comparable accuracy with extremely smaller batch size, such as 64 timessmaller on CIFAR-10/100 and 8 times smaller on ImageNet.

Quick Read (beta)

loading the full paper ...