NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

Abstract

We propose a video editing framework, NaRCan, which integrates a hybriddeformation field and diffusion prior to generate high-quality naturalcanonical images to represent the input video. Our approach utilizes homographyto model global motion and employs multi-layer perceptrons (MLPs) to capturelocal residual deformations, enhancing the model's ability to handle complexvideo dynamics. By introducing a diffusion prior from the early stages oftraining, our model ensures that the generated images retain a high-qualitynatural appearance, making the produced canonical images suitable for variousdownstream tasks in video editing, a capability not achieved by currentcanonical-based methods. Furthermore, we incorporate low-rank adaptation (LoRA)fine-tuning and introduce a noise and diffusion prior update schedulingtechnique that accelerates the training process by 14 times. Extensiveexperimental results show that our method outperforms existing approaches invarious video editing tasks and produces coherent and high-quality edited videosequences. See our project page for video results athttps://koi953215.github.io/NaRCan_page/.

Quick Read (beta)

loading the full paper ...