Rademacher upper bounds for cross-validation errors with an application to the lasso

  • 2020-07-30 17:13:03
  • Ning Xu, Timothy C. G. Fisher, Jian Hong
  • 1

Abstract

We establish a general upper bound for $K$-fold cross-validation ($K$-CV)errors that can be adapted to many $K$-CV-based estimators and learningalgorithms. Based on Rademacher complexity of the model and theOrlicz-$\Psi_{\nu}$ norm of the error process, the CV error upper bound appliesto both light-tail and heavy-tail error distributions. We also extend the CVerror upper bound to $\beta$-mixing data using the technique of independentblocking. We provide a Python package (\texttt{CVbound},\url{https://github.com/isaac2math}) for computing the CV error upper bound in$K$-CV-based algorithms. Using the lasso as an example, we demonstrate insimulations that the upper bounds are tight and stable across differentparameter settings and random seeds. As well as accurately bounding the CVerrors for the lasso, the minimizer of the new upper bounds can be used as acriterion for variable selection. Compared with the CV-error minimizer,simulations show that tuning the lasso penalty parameter according to theminimizer of the upper bound yields a more sparse and more stable model thatretains all of the relevant variables.

 

Quick Read (beta)

loading the full paper ...