Data-Free Quantization through Weight Equalization and Bias Correction

Abstract

We introduce a data-free quantization method for deep neural networks thatdoes not require fine-tuning or hyperparameter selection. It achievesnear-original model performance on common computer vision architectures andtasks. 8-bit fixed-point quantization is essential for efficient inference inmodern deep learning hardware architectures. However, quantizing models to runin 8-bit is a non-trivial task, frequently leading to either significantperformance reduction or engineering time spent on training a network to beamenable to quantization. Our approach relies on equalizing the weight rangesin the network by making use of a scale-equivariance property of activationfunctions. In addition the method corrects biases in the error that areintroduced during quantization. This improves quantization accuracyperformance, and can be applied ubiquitously to almost any model with astraight-forward API call. For common architectures, such as the MobileNetfamily, we achieve state-of-the-art quantized model performance. We furthershow that the method also extends to other computer vision architectures andtasks such as semantic segmentation and object detection.

Quick Read (beta)

loading the full paper ...