Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Abstract

In this paper, we study the implicit regularization of the gradient descentalgorithm in homogeneous neural networks, including fully-connected andconvolutional neural networks with ReLU or LeakyReLU activations. Inparticular, we study the gradient descent or gradient flow (i.e., gradientdescent with infinitesimal step size) optimizing the logistic loss orcross-entropy loss of any homogeneous model (possibly non-smooth), and showthat if the training loss decreases below a certain threshold, then we candefine a smoothed version of the normalized margin which increases over time.We also formulate a natural constrained optimization problem related to marginmaximization, and prove that both the normalized margin and its smoothedversion converge to the objective value at a KKT point of the optimizationproblem. Our results generalize the previous results for logistic regressionwith one-layer or multi-layer linear networks, and provide more quantitativeconvergence results with weaker assumptions than previous results forhomogeneous smooth neural networks. We conduct several experiments to justifyour theoretical finding on MNIST and CIFAR-10 datasets. Finally, as margin isclosely related to robustness, we discuss potential benefits of training longerfor improving the robustness of the model.

Quick Read (beta)

loading the full paper ...