Abstract
Combining randomized estimators in an ensemble, such as via random forests,has become a fundamental technique in modern data science, but can becomputationally expensive. Furthermore, the mechanism by which this improvespredictive performance is poorly understood. We address these issues in thecontext of sparse linear regression by proposing and analyzing an ensemble ofgreedy forward selection estimators that are randomized by feature subsampling-- at each iteration, the best feature is selected from within a random subset.We design a novel implementation based on dynamic programming that greatlyimproves its computational efficiency. Furthermore, we show via carefulnumerical experiments that our method can outperform popular methods such aslasso and elastic net across a wide range of settings. Next, contrary toprevailing belief that randomized ensembling is analogous to shrinkage, we showvia numerical experiments that it can simultaneously reduce training error anddegrees of freedom, thereby shifting the entire bias-variance trade-off curveof the base estimator. We prove this fact rigorously in the setting oforthogonal features, in which case, the ensemble estimator rescales theordinary least squares coefficients with a two-parameter family of logisticweights, thereby enlarging the model search space. These results enhance ourunderstanding of random forests and suggest that implicit regularization ingeneral may have more complicated effects than explicit regularization.