YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting

Abstract

Fast and flexible 3D scene reconstruction from unstructured image collectionsremains a significant challenge. We present YoNoSplat, a feedforward model thatreconstructs high-quality 3D Gaussian Splatting representations from anarbitrary number of images. Our model is highly versatile, operatingeffectively with both posed and unposed, calibrated and uncalibrated inputs.YoNoSplat predicts local Gaussians and camera poses for each view, which areaggregated into a global representation using either predicted or providedposes. To overcome the inherent difficulty of jointly learning 3D Gaussians andcamera parameters, we introduce a novel mixing training strategy. This approachmitigates the entanglement between the two tasks by initially usingground-truth poses to aggregate local Gaussians and gradually transitioning toa mix of predicted and ground-truth poses, which prevents both traininginstability and exposure bias. We further resolve the scale ambiguity problemby a novel pairwise camera-distance normalization scheme and by embeddingcamera intrinsics into the network. Moreover, YoNoSplat also predicts intrinsicparameters, making it feasible for uncalibrated inputs. YoNoSplat demonstratesexceptional efficiency, reconstructing a scene from 100 views (at 280x518resolution) in just 2.69 seconds on an NVIDIA GH200 GPU. It achievesstate-of-the-art performance on standard benchmarks in both pose-free andpose-dependent settings. Our project page is athttps://botaoye.github.io/yonosplat/.

Quick Read (beta)

loading the full paper ...