Generalization in Supervised Learning Through Riemannian Contraction

Abstract

We prove that Riemannian contraction in a supervised learning setting impliesgeneralization. Specifically, we show that if an optimizer is contracting insome Riemannian metric with rate $\lambda > 0$, it is uniformly algorithmicallystable with rate $\mathcal{O}(1/\lambda n)$, where $n$ is the number oflabelled examples in the training set. The results hold for stochastic anddeterministic optimization, in both continuous and discrete-time, for convexand non-convex loss surfaces. The associated generalization bounds reduce towell-known results in the particular case of gradient descent over convex orstrongly convex loss surfaces. They can be shown to be optimal in certainlinear settings, such as kernel ridge regression under gradient flow.

Quick Read (beta)

loading the full paper ...