Abstract
While Structure-from-Motion (SfM) has seen much progress over the years,state-of-the-art systems are prone to failure when facing extreme viewpointchanges in low-overlap, low-parallax or high-symmetry scenarios. Becausecapturing images that avoid these pitfalls is challenging, this severely limitsthe wider use of SfM, especially by non-expert users. We overcome theselimitations by augmenting the classical SfM paradigm with monocular depth andnormal priors inferred by deep neural networks. Thanks to a tight integrationof monocular and multi-view constraints, our approach significantly outperformsexisting ones under extreme viewpoint changes, while maintaining strongperformance in standard conditions. We also show that monocular priors can helpreject faulty associations due to symmetries, which is a long-standing problemfor SfM. This makes our approach the first capable of reliably reconstructingchallenging indoor environments from few images. Through principled uncertaintypropagation, it is robust to errors in the priors, can handle priors inferredby different models with little tuning, and will thus easily benefit fromfuture progress in monocular depth and normal estimation. Our code is publiclyavailable at https://github.com/cvg/mpsfm.