Understanding Learning Invariance in Deep Linear Networks

Abstract

Equivariant and invariant machine learning models exploit symmetries andstructural patterns in data to improve sample efficiency. While empiricalstudies suggest that data-driven methods such as regularization and dataaugmentation can perform comparably to explicitly invariant models, theoreticalinsights remain scarce. In this paper, we provide a theoretical comparison ofthree approaches for achieving invariance: data augmentation, regularization,and hard-wiring. We focus on mean squared error regression with deep linearnetworks, which parametrize rank-bounded linear maps and can be hard-wired tobe invariant to specific group actions. We show that the critical points of theoptimization problems for hard-wiring and data augmentation are identical,consisting solely of saddles and the global optimum. By contrast,regularization introduces additional critical points, though they remainsaddles except for the global optimum. Moreover, we demonstrate that theregularization path is continuous and converges to the hard-wired solution.

Quick Read (beta)

loading the full paper ...