Abstract
This paper introduces a modular, non-deep learning method for filtering andrefining sparse correspondences in image matching. Assuming that motion flowwithin the scene can be approximated by local homography transformations,matches are aggregated into overlapping clusters corresponding to virtualplanes using an iterative RANSAC-based approach, with non-conformingcorrespondences discarded. Moreover, the underlying planar structural designprovides an explicit map between local patches associated with the matches,enabling optional refinement of keypoint positions through cross-correlationtemplate matching after patch reprojection. Finally, to enhance robustness andfault-tolerance against violations of the piece-wise planar approximationassumption, a further strategy is designed for minimizing relative patchdistortion in the plane reprojection by introducing an intermediate homographythat projects both patches into a common plane. The proposed method isextensively evaluated on standard datasets and image matching pipelines, andcompared with state-of-the-art approaches. Unlike other current comparisons,the proposed benchmark also takes into account the more general, real, andpractical cases where camera intrinsics are unavailable. Experimental resultsdemonstrate that our proposed non-deep learning, geometry-based approachachieves performances that are either superior to or on par with recentstate-of-the-art deep learning methods. Finally, this study suggests that thereare still development potential in actual image matching solutions in theconsidered research direction, which could be in the future incorporated innovel deep image matching architectures.