MapNet: Geometry-Aware Learning of Maps for Camera Localization

Abstract

Maps are a key component in image-based camera localization and visual SLAMsystems: they are used to establish geometric constraints between images,correct drift in relative pose estimation, and relocalize cameras after losttracking. The exact definitions of maps, however, are oftenapplication-specific and hand-crafted for different scenarios (e.g., 3Dlandmarks, lines, planes, bags of visual words). We propose to represent mapsas a deep neural net called MapNet, which enables learning a data-driven maprepresentation. Unlike prior work on learning maps, MapNet exploits cheap andubiquitous sensory inputs like visual odometry and GPS in addition to imagesand fuses them together for camera localization. Geometric constraintsexpressed by these inputs, which have traditionally been used in bundleadjustment or pose-graph optimization, are formulated as loss terms in MapNettraining and also used during inference. In addition to directly improvinglocalization accuracy, this allows us to update the MapNet (i.e., maps) in aself-supervised manner using additional unlabeled video sequences from thescene. We also propose a novel parameterization for camera rotation which isbetter suited for deep-learning based camera pose regression. Experimentalresults on both the indoor 7-Scenes dataset and the outdoor Oxford RobotCardataset show significant performance improvement over prior work.

Quick Read (beta)

loading the full paper ...