Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success

Abstract

Random forests remain among the most popular off-the-shelf supervised machinelearning tools with a well-established track record of predictive accuracy inboth regression and classification settings. Despite their empirical success aswell as a bevy of recent work investigating their statistical properties, afull and satisfying explanation for their success has yet to be put forth. Herewe aim to take a step forward in this direction by demonstrating that theadditional randomness injected into individual trees serves as a form ofimplicit regularization, making random forests an ideal model in lowsignal-to-noise ratio (SNR) settings. Specifically, from a model-complexityperspective, we show that the mtry parameter in random forests serves much thesame purpose as the shrinkage penalty in explicitly regularized regressionprocedures like lasso and ridge regression. To highlight this point, we designa randomized linear-model-based forward selection procedure intended as ananalogue to tree-based random forests and demonstrate its surprisingly strongempirical performance. Numerous demonstrations on both real and synthetic dataare provided.

Quick Read (beta)

loading the full paper ...