Feature-Based Lie Group Transformer for Real-World Applications

Abstract

The main goal of representation learning is to acquire meaningfulrepresentations from real-world sensory inputs without supervision.Representation learning explains some aspects of human development. Variousneural network (NN) models have been proposed that acquire empirically goodrepresentations. However, the formulation of a good representation has not beenestablished. We recently proposed a method for categorizing changes between apair of sensory inputs. A unique feature of this approach is thattransformations between two sensory inputs are learned to satisfy algebraicstructural constraints. Conventional representation learning often assumes thatdisentangled independent feature axes is a good representation; however, wefound that such a representation cannot account for conditional independence.To overcome this problem, we proposed a new method using group decomposition inGalois algebra theory. Although this method is promising for defining a moregeneral representation, it assumes pixel-to-pixel translation without featureextraction, and can only process low-resolution images with no background,which prevents real-world application. In this study, we provide a simplemethod to apply our group decomposition theory to a more realistic scenario bycombining feature extraction and object segmentation. We replace pixeltranslation with feature translation and formulate object segmentation asgrouping features under the same transformation. We validated the proposedmethod on a practical dataset containing both real-world object and background.We believe that our model will lead to a better understanding of humandevelopment of object recognition in the real world.

Quick Read (beta)

loading the full paper ...