LDMVFI: Video Frame Interpolation with Latent Diffusion Models

Abstract

Existing works on video frame interpolation (VFI) mostly employ deep neuralnetworks trained to minimize the L1 or L2 distance between their outputs andground-truth frames. Despite recent advances, existing VFI methods tend toproduce perceptually inferior results, particularly for challenging scenariosincluding large motions and dynamic textures. Towards developingperceptually-oriented VFI methods, we propose latent diffusion model-based VFI,LDMVFI. This approaches the VFI problem from a generative perspective byformulating it as a conditional generation problem. As the first effort toaddress VFI using latent diffusion models, we rigorously benchmark our methodfollowing the common evaluation protocol adopted in the existing VFIliterature. Our quantitative experiments and user study indicate that LDMVFI isable to interpolate video content with superior perceptual quality compared tothe state of the art, even in the high-resolution regime. Our source code willbe made available here.

Quick Read (beta)

loading the full paper ...