Abstract
Recent advances in 3D scene reconstruction enable real-time viewing invirtual and augmented reality. To support interactive operations for betterimmersiveness, such as moving or editing objects, 3D scene inpainting methodsare proposed to repair or complete the altered geometry. However, currentapproaches rely on lengthy and computationally intensive optimization, makingthem impractical for real-time or online applications. We propose InstaInpaint,a reference-based feed-forward framework that produces 3D-scene inpainting froma 2D inpainting proposal within 0.4 seconds. We develop a self-supervisedmasked-finetuning strategy to enable training of our custom largereconstruction model (LRM) on the large-scale dataset. Through extensiveexperiments, we analyze and identify several key designs that improvegeneralization, textural consistency, and geometric correctness. InstaInpaintachieves a 1000x speed-up from prior methods while maintaining astate-of-the-art performance across two standard benchmarks. Moreover, we showthat InstaInpaint generalizes well to flexible downstream applications such asobject insertion and multi-region inpainting. More video results are availableat our project page: https://dhmbb2.github.io/InstaInpaint_page/.