Abstract
Learning feature correspondence is a foundational task in computer vision,holding immense importance for downstream applications such as visual odometryand 3D reconstruction. Despite recent progress in data-driven models, featurecorrespondence learning is still limited by the lack of accurate per-pixelcorrespondence labels. To overcome this difficulty, we introduce a newself-supervised scheme, imperative learning (IL), for training featurecorrespondence. It enables correspondence learning on arbitrary uninterruptedvideos without any camera pose or depth labels, heralding a new era forself-supervised correspondence learning. Specifically, we formulated theproblem of correspondence learning as a bilevel optimization, which takes thereprojection error from bundle adjustment as a supervisory signal for themodel. To avoid large memory and computation overhead, we leverage thestationary point to effectively back-propagate the implicit gradients throughbundle adjustment. Through extensive experiments, we demonstrate superiorperformance on tasks including feature matching and pose estimation, in whichwe obtained an average of 30% accuracy gain over the state-of-the-art matchingmodels. This preprint corresponds to the Accepted Manuscript in EuropeanConference on Computer Vision (ECCV) 2024.