Abstract
Recent advances in monocular depth estimation methods (MDE) and theirimproved accuracy open new possibilities for their applications. In this paper,we investigate how monocular depth estimates can be used for relative poseestimation. In particular, we are interested in answering the question whetherusing MDEs improves results over traditional point-based methods. We propose anovel framework for estimating the relative pose of two cameras from pointcorrespondences with associated monocular depths. Since depth predictions aretypically defined up to an unknown scale or even both unknown scale and shiftparameters, our solvers jointly estimate the scale or both the scale and shiftparameters along with the relative pose. We derive efficient solversconsidering different types of depths for three camera configurations: (1) twocalibrated cameras, (2) two cameras with an unknown shared focal length, and(3) two cameras with unknown different focal lengths. Our new solversoutperform state-of-the-art depth-aware solvers in terms of speed and accuracy.In extensive real experiments on multiple datasets and with various MDEs, wediscuss which depth-aware solvers are preferable in which situation. The codewill be made publicly available.