Fast and Accurate Least-Mean-Squares Solvers

Abstract

Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression,SVD and Elastic-Net not only solve fundamental machine learning problems, butare also the building blocks in a variety of other methods, such as decisiontrees and matrix factorizations. We suggest an algorithm that gets a finite set of $n$ $d$-dimensional realvectors and returns a weighted subset of $d+1$ vectors whose sum is\emph{exactly} the same. The proof in Caratheodory's Theorem (1907) computessuch a subset in $O(n^2d^2)$ time and thus not used in practice. Our algorithmcomputes this subset in $O(nd)$ time, using $O(\log n)$ calls to Caratheodory'sconstruction on small but "smart" subsets. This is based on a novel paradigm offusion between different data summarization techniques, known as sketches andcoresets. As an example application, we show how it can be used to boost theperformance of existing LMS solvers, such as those in scikit-learn library, upto x100. Generalization for streaming and distributed (big) data is trivial.Extensive experimental results and complete open source code are also provided.

Quick Read (beta)

loading the full paper ...