Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent

Abstract

We provide a rigorous analysis of implicit regularization in anoverparametrized tensor factorization problem beyond the lazy training regime.For matrix factorization problems, this phenomenon has been studied in a numberof works. A particular challenge has been to design universal initializationstrategies which provably lead to implicit regularization in gradient-descentmethods. At the same time, it has been argued by Cohen et. al. 2016 that moregeneral classes of neural networks can be captured by considering tensorfactorizations. However, in the tensor case, implicit regularization has onlybeen rigorously established for gradient flow or in the lazy training regime.In this paper, we prove the first tensor result of its kind for gradientdescent rather than gradient flow. We focus on the tubal tensor product and theassociated notion of low tubal rank, encouraged by the relevance of this modelfor image data. We establish that gradient descent in an overparametrizedtensor factorization model with a small random initialization exhibits animplicit bias towards solutions of low tubal rank. Our theoretical findings areillustrated in an extensive set of numerical simulations show-casing thedynamics predicted by our theory as well as the crucial role of using a smallrandom initialization.

Quick Read (beta)

loading the full paper ...