Comparing regularisation paths of (conjugate) gradient estimators in ridge regression

Abstract

We consider standard gradient descent, gradient flow and conjugate gradientsas iterative algorithms for minimizing a penalized ridge criterion in linearregression. While it is well known that conjugate gradients exhibit fastnumerical convergence, the statistical properties of their iterates are moredifficult to assess due to inherent nonlinearities and dependencies. On theother hand, standard gradient flow is a linear method with well knownregularizing properties when stopped early. By an explicit non-standard errordecomposition we are able to bound the prediction error for conjugate gradientiterates by a corresponding prediction error of gradient flow at transformediteration indices. This way, the risk along the entire regularisation path ofconjugate gradient iterations can be compared to that for regularisation pathsof standard linear methods like gradient flow and ridge regression. Inparticular, the oracle conjugate gradient iterate shares the optimalityproperties of the gradient flow and ridge regression oracles up to a constantfactor. Numerical examples show the similarity of the regularisation paths inpractice.

Quick Read (beta)

loading the full paper ...