Abstract
Visual localization is the task of estimating a camera pose in a knownenvironment. In this paper, we utilize 3D Gaussian Splatting (3DGS)-basedrepresentations for accurate and privacy-preserving visual localization. Wepropose Gaussian Splatting Feature Fields (GSFFs), a scene representation forvisual localization that combines an explicit geometry model (3DGS) with animplicit feature field. We leverage the dense geometric information anddifferentiable rasterization algorithm from 3DGS to learn robust featurerepresentations grounded in 3D. In particular, we align a 3D scale-awarefeature field and a 2D feature encoder in a common embedding space through acontrastive framework. Using a 3D structure-informed clustering procedure, wefurther regularize the representation learning and seamlessly convert thefeatures to segmentations, which can be used for privacy-preserving visuallocalization. Pose refinement, which involves aligning either feature maps orsegmentations from a query image with those rendered from the GSFFs scenerepresentation, is used to achieve localization. The resulting privacy- andnon-privacy-preserving localization pipelines, evaluated on multiple real-worlddatasets, show state-of-the-art performances.