Insights from Generative Modeling for Neural Video Compression

Abstract

While recent machine learning research has revealed connections between deepgenerative models such as VAEs and rate-distortion losses used in learnedcompression, most of this work has focused on images. In a similar spirit, weview recently proposed neural video coding algorithms through the lens of deepautoregressive and latent variable modeling. We present recent neural videocodecs as instances of a generalized stochastic temporal autoregressivetransform, and propose new avenues for further improvements inspired bynormalizing flows and structured priors. We propose several architectures thatyield state-of-the-art video compression performance on full-resolution videoand discuss their tradeoffs and ablations. In particular, we propose (i)improved temporal autoregressive transforms, (ii) improved entropy models withstructured and temporal dependencies, and (iii) variable bitrate versions ofour algorithms. Since our improvements are compatible with a large class ofexisting models, we provide further evidence that the generative modelingviewpoint can advance the neural video coding field.

Quick Read (beta)

loading the full paper ...