Abstract
Quantization is a fundamental optimization for many machine-learning usecases, including compressing gradients, model weights and activations, anddatasets. The most accurate form of quantization is \emph{adaptive}, where theerror is minimized with respect to a given input, rather than optimizing forthe worst case. However, optimal adaptive quantization methods are consideredinfeasible in terms of both their runtime and memory requirements. We revisit the Adaptive Vector Quantization (AVQ) problem and presentalgorithms that find optimal solutions with asymptotically improved time andspace complexity. We also present an even faster near-optimal algorithm forlarge inputs. Our experiments show our algorithms may open the door to usingAVQ more extensively in a variety of machine learning applications.