Image Compression with Product Quantized Masked Image Modeling

Abstract

Recent neural compression methods have been based on the popular hyperpriorframework. It relies on Scalar Quantization and offers a very strongcompression performance. This contrasts from recent advances in imagegeneration and representation learning, where Vector Quantization is morecommonly employed. In this work, we attempt to bring these lines of researchcloser by revisiting vector quantization for image compression. We build uponthe VQ-VAE framework and introduce several modifications. First, we replace thevanilla vector quantizer by a product quantizer. This intermediate solutionbetween vector and scalar quantization allows for a much wider set ofrate-distortion points: It implicitly defines high-quality quantizers thatwould otherwise require intractably large codebooks. Second, inspired by thesuccess of Masked Image Modeling (MIM) in the context of self-supervisedlearning and generative image models, we propose a novel conditional entropymodel which improves entropy coding by modelling the co-dependencies of thequantized latent codes. The resulting PQ-MIM model is surprisingly effective:its compression performance on par with recent hyperprior methods. It alsooutperforms HiFiC in terms of FID and KID metrics when optimized withperceptual losses (e.g. adversarial). Finally, since PQ-MIM is compatible withimage generation frameworks, we show qualitatively that it can operate under ahybrid mode between compression and generation, with no further training orfinetuning. As a result, we explore the extreme compression regime where animage is compressed into 200 bytes, i.e., less than a tweet.

Quick Read (beta)

loading the full paper ...