Gradient-Normalized Smoothness for Optimization with Approximate Hessians

Abstract

In this work, we develop new optimization algorithms that use approximatesecond-order information combined with the gradient regularization technique toachieve fast global convergence rates for both convex and non-convexobjectives. The key innovation of our analysis is a novel notion calledGradient-Normalized Smoothness, which characterizes the maximum radius of aball around the current point that yields a good relative approximation of thegradient field. Our theory establishes a natural intrinsic connection betweenHessian approximation and the linearization of the gradient. Importantly,Gradient-Normalized Smoothness does not depend on the specific problem class ofthe objective functions, while effectively translating local information aboutthe gradient field and Hessian approximation into the global behavior of themethod. This new concept equips approximate second-order algorithms withuniversal global convergence guarantees, recovering state-of-the-art rates forfunctions with H\"older-continuous Hessians and third derivatives,quasi-self-concordant functions, as well as smooth classes in first-orderoptimization. These rates are achieved automatically and extend to broaderclasses, such as generalized self-concordant functions. We demonstrate directapplications of our results for global linear rates in logistic regression andsoftmax problems with approximate Hessians, as well as in non-convexoptimization using Fisher and Gauss-Newton approximations.

Quick Read (beta)

loading the full paper ...