Consistent Risk Estimation in Moderately High-Dimensional Linear Regression

Abstract

Risk estimation is at the core of many learning systems. The importance ofthis problem has motivated researchers to propose different schemes, such ascross validation, generalized cross validation, and Bootstrap. The theoreticalproperties of such estimates have been extensively studied in thelow-dimensional settings, where the number of predictors $p$ is much smallerthan the number of observations $n$. However, a unifying methodologyaccompanied with a rigorous theory is lacking in high-dimensional settings.This paper studies the problem of risk estimation under the moderatelyhigh-dimensional asymptotic setting $n,p \rightarrow \infty$ and $n/p\rightarrow \delta>1$ ($\delta$ is a fixed number), and proves the consistencyof three risk estimates that have been successful in numerical studies, i.e.,leave-one-out cross validation (LOOCV), approximate leave-one-out (ALO), andapproximate message passing (AMP)-based techniques. A corner stone of ouranalysis is a bound that we obtain on the discrepancy of the `residuals'obtained from AMP and LOOCV. This connection not only enables us to obtain amore refined information on the estimates of AMP, ALO, and LOOCV, but alsooffers an upper bound on the convergence rate of each estimate.

Quick Read (beta)

loading the full paper ...