Loss Landscape Characterization of Neural Networks without Over-Parametrization

Abstract

Optimization methods play a crucial role in modern machine learning, poweringthe remarkable empirical achievements of deep learning models. These successesare even more remarkable given the complex non-convex nature of the losslandscape of these models. Yet, ensuring the convergence of optimizationmethods requires specific structural conditions on the objective function thatare rarely satisfied in practice. One prominent example is the widelyrecognized Polyak-Lojasiewicz (PL) inequality, which has gained considerableattention in recent years. However, validating such assumptions for deep neuralnetworks entails substantial and often impractical levels ofover-parametrization. In order to address this limitation, we propose a novelclass of functions that can characterize the loss landscape of modern deepmodels without requiring extensive over-parametrization and can also includesaddle points. Crucially, we prove that gradient-based optimizers possesstheoretical guarantees of convergence under this assumption. Finally, wevalidate the soundness of our new function class through both theoreticalanalysis and empirical experimentation across a diverse range of deep learningmodels.

Quick Read (beta)

loading the full paper ...