VideoGigaGAN: Towards Detail-rich Video Super-Resolution

Abstract

Video super-resolution (VSR) approaches have shown impressive temporalconsistency in upsampled videos. However, these approaches tend to generateblurrier results than their image counterparts as they are limited in theirgenerative capability. This raises a fundamental question: can we extend thesuccess of a generative image upsampler to the VSR task while preserving thetemporal consistency? We introduce VideoGigaGAN, a new generative VSR modelthat can produce videos with high-frequency details and temporal consistency.VideoGigaGAN builds upon a large-scale image upsampler -- GigaGAN. Simplyinflating GigaGAN to a video model by adding temporal modules produces severetemporal flickering. We identify several key issues and propose techniques thatsignificantly improve the temporal consistency of upsampled videos. Ourexperiments show that, unlike previous VSR methods, VideoGigaGAN generatestemporally consistent videos with more fine-grained appearance details. Wevalidate the effectiveness of VideoGigaGAN by comparing it withstate-of-the-art VSR models on public datasets and showcasing video resultswith $8\times$ super-resolution.

Quick Read (beta)

loading the full paper ...