Scaling and renormalization in high-dimensional regression

Abstract

This paper presents a succinct derivation of the training and generalizationperformance of a variety of high-dimensional ridge regression models using thebasic tools of random matrix theory and free probability. We provide anintroduction and review of recent results on these topics, aimed at readerswith backgrounds in physics and deep learning. Analytic formulas for thetraining and generalization errors are obtained in a few lines of algebradirectly from the properties of the $S$-transform of free probability. Thisallows for a straightforward identification of the sources of power-law scalingin model performance. We compute the generalization error of a broad class ofrandom feature models. We find that in all models, the $S$-transformcorresponds to the train-test generalization gap, and yields an analogue of thegeneralized-cross-validation estimator. Using these techniques, we derivefine-grained bias-variance decompositions for a very general class of randomfeature models with structured covariates. These novel results allow us todiscover a scaling regime for random feature models where the variance due tothe features limits performance in the overparameterized setting. We alsodemonstrate how anisotropic weight structure in random feature models can limitperformance and lead to nontrivial exponents for finite-width corrections inthe overparameterized setting. Our results extend and provide a unifyingperspective on earlier models of neural scaling laws.

Quick Read (beta)

loading the full paper ...