GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs

Abstract

Finding local correspondences between images with different viewpointsrequires local descriptors that are robust against geometric transformations.An approach for transformation invariance is to integrate out thetransformations by pooling the features extracted from transformed versions ofan image. However, the feature pooling may sacrifice the distinctiveness of theresulting descriptors. In this paper, we introduce a novel visual descriptornamed Group Invariant Feature Transform (GIFT), which is both discriminativeand robust to geometric transformations. The key idea is that the featuresextracted from the transformed versions of an image can be viewed as a functiondefined on the group of the transformations. Instead of feature pooling, we usegroup convolutions to exploit underlying structures of the extracted featureson the group, resulting in descriptors that are both discriminative andprovably invariant to the group of transformations. Extensive experiments showthat GIFT outperforms state-of-the-art methods on several benchmark datasetsand practically improves the performance of relative pose estimation.

Quick Read (beta)

loading the full paper ...