Fillerbuster: Multi-View Scene Completion for Casual Captures

Abstract

We present Fillerbuster, a method that completes unknown regions of a 3Dscene by utilizing a novel large-scale multi-view latent diffusion transformer.Casual captures are often sparse and miss surrounding content behind objects orabove the scene. Existing methods are not suitable for handling this challengeas they focus on making the known pixels look good with sparse-view priors, oron creating the missing sides of objects from just one or two photos. Inreality, we often have hundreds of input frames and want to complete areas thatare missing and unobserved from the input frames. Additionally, the imagesoften do not have known camera parameters. Our solution is to train agenerative model that can consume a large context of input frames whilegenerating unknown target views and recovering image poses when desired. Weshow results where we complete partial captures on two existing datasets. Wealso present an uncalibrated scene completion task where our unified modelpredicts both poses and creates new content. Our model is the first to predictmany images and poses together for scene completion.

Quick Read (beta)

loading the full paper ...