Post-training Quantization for Neural Networks with Provable Guarantees

Abstract

While neural networks have been remarkably successful in a wide array ofapplications, implementing them in resource-constrained hardware remains anarea of intense research. By replacing the weights of a neural network withquantized (e.g., 4-bit, or binary) counterparts, massive savings in computationcost, memory, and power consumption are attained. We modify a post-trainingneural-network quantization method, GPFQ, that is based on a greedypath-following mechanism, and rigorously analyze its error. We prove that forquantizing a single-layer network, the relative square error essentially decayslinearly in the number of weights -- i.e., level of over-parametrization. Ourresult holds across a range of input distributions and for both fully-connectedand convolutional architectures. To empirically evaluate the method, wequantize several common architectures with few bits per weight, and test themon ImageNet, showing only minor loss of accuracy. We also demonstrate thatstandard modifications, such as bias correction and mixed precisionquantization, further improve accuracy.

Quick Read (beta)

loading the full paper ...