Abstract
VINGS-Mono is a monocular (inertial) Gaussian Splatting (GS) SLAM frameworkdesigned for large scenes. The framework comprises four main components: VIOFront End, 2D Gaussian Map, NVS Loop Closure, and Dynamic Eraser. In the VIOFront End, RGB frames are processed through dense bundle adjustment anduncertainty estimation to extract scene geometry and poses. Based on thisoutput, the mapping module incrementally constructs and maintains a 2D Gaussianmap. Key components of the 2D Gaussian Map include a Sample-based Rasterizer,Score Manager, and Pose Refinement, which collectively improve mapping speedand localization accuracy. This enables the SLAM system to handle large-scaleurban environments with up to 50 million Gaussian ellipsoids. To ensure globalconsistency in large-scale scenes, we design a Loop Closure module, whichinnovatively leverages the Novel View Synthesis (NVS) capabilities of GaussianSplatting for loop closure detection and correction of the Gaussian map.Additionally, we propose a Dynamic Eraser to address the inevitable presence ofdynamic objects in real-world outdoor scenes. Extensive evaluations in indoorand outdoor environments demonstrate that our approach achieves localizationperformance on par with Visual-Inertial Odometry while surpassing recentGS/NeRF SLAM methods. It also significantly outperforms all existing methods interms of mapping and rendering quality. Furthermore, we developed a mobile appand verified that our framework can generate high-quality Gaussian maps in realtime using only a smartphone camera and a low-frequency IMU sensor. To the bestof our knowledge, VINGS-Mono is the first monocular Gaussian SLAM methodcapable of operating in outdoor environments and supporting kilometer-scalelarge scenes.