6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

Abstract

We propose 6DGS to estimate the camera pose of a target RGB image given a 3DGaussian Splatting (3DGS) model representing the scene. 6DGS avoids theiterative process typical of analysis-by-synthesis methods (e.g. iNeRF) thatalso require an initialization of the camera pose in order to converge.Instead, our method estimates a 6DoF pose by inverting the 3DGS renderingprocess. Starting from the object surface, we define a radiant Ellicell thatuniformly generates rays departing from each ellipsoid that parameterize the3DGS model. Each Ellicell ray is associated with the rendering parameters ofeach ellipsoid, which in turn is used to obtain the best bindings between thetarget image pixels and the cast rays. These pixel-ray bindings are then rankedto select the best scoring bundle of rays, which their intersection providesthe camera center and, in turn, the camera rotation. The proposed solutionobviates the necessity of an "a priori" pose for initialization, and it solves6DoF pose estimation in closed form, without the need for iterations. Moreover,compared to the existing Novel View Synthesis (NVS) baselines for poseestimation, 6DGS can improve the overall average rotational accuracy by 12% andtranslation accuracy by 22% on real scenes, despite not requiring anyinitialization pose. At the same time, our method operates near real-time,reaching 15fps on consumer hardware.

Quick Read (beta)

loading the full paper ...