Implicit Regularization in Deep Learning May Not Be Explainable by Norms

Abstract

Mathematically characterizing the implicit regularization induced bygradient-based optimization is a longstanding pursuit in the theory of deeplearning. A widespread hope is that a characterization based on minimization ofnorms may apply, and a standard test-bed for studying this prospect is matrixfactorization (matrix completion via linear neural networks). It is an openquestion whether norms can explain the implicit regularization in matrixfactorization. The current paper resolves this open question in the negative,by proving that there exist natural matrix factorization problems on which theimplicit regularization drives all norms (and quasi-norms) towards infinity.Our results suggest that, rather than perceiving the implicit regularizationvia norms, a potentially more useful interpretation is minimization of rank. Wedemonstrate empirically that this interpretation extends to a certain class ofnon-linear neural networks, and hypothesize that it may be key to explaininggeneralization in deep learning.

Quick Read (beta)

loading the full paper ...