Learning low-precision neural networks without Straight-Through Estimator(STE)

Abstract

The Straight-Through Estimator (STE) is widely used for back-propagatinggradients through the quantization function, but the STE technique lacks acomplete theoretical understanding. We propose an alternative methodologycalled alpha-blending (AB), which quantizes neural networks to low-precisionusing stochastic gradient descent (SGD). Our method (AB) avoids STEapproximation by replacing the quantized weight in the loss function by anaffine combination of the quantized weight w_q and the correspondingfull-precision weight w with non-trainable scalar coefficient \alpha and1-\alpha. During training, \alpha is gradually increased from 0 to 1; thegradient updates to the weights are through the full-precision term,(1-\alpha)w, of the affine combination; the model is converted fromfull-precision to low-precision progressively. To evaluate the method, a 1-bitBinaryNet on CIFAR10 dataset and 8-bits, 4-bits MobileNet v1, ResNet_50 v1/2 onImageNet dataset are trained using the alpha-blending approach, and theevaluation indicates that AB improves top-1 accuracy by 0.9%, 0.82% and 2.93%respectively compared to the results of STE based quantization.

Quick Read (beta)

loading the full paper ...