### Abstract

A random $m\times n$ matrix $S$ is an oblivious subspace embedding (OSE) withparameters $\epsilon>0$, $\delta\in(0,1/3)$ and $d\leq m\leq n$, if for any$d$-dimensional subspace $W\subseteq R^n$, $P\big(\,\forall_{x\in W}\ (1+\epsilon)^{-1}\|x\|\leq\|Sx\|\leq(1+\epsilon)\|x\|\,\big)\geq 1-\delta.$ It is known that the embedding dimension of an OSE must satisfy $m\geq d$,and for any $\theta > 0$, a Gaussian embedding matrix with $m\geq (1+\theta) d$is an OSE with $\epsilon = O_\theta(1)$. However, such optimal embeddingdimension is not known for other embeddings. Of particular interest are sparseOSEs, having $s\ll m$ non-zeros per column, with applications to problems suchas least squares regression and low-rank approximation. We show that, given any $\theta > 0$, an $m\times n$ random matrix $S$ with$m\geq (1+\theta)d$ consisting of randomly sparsified $\pm1/\sqrt s$ entriesand having $s= O(\log^4(d))$ non-zeros per column, is an oblivious subspaceembedding with $\epsilon = O_{\theta}(1)$. Our result addresses the main openquestion posed by Nelson and Nguyen (FOCS 2013), who conjectured that sparseOSEs can achieve $m=O(d)$ embedding dimension, and it improves on$m=O(d\log(d))$ shown by Cohen (SODA 2016). We use this to construct the firstoblivious subspace embedding with $O(d)$ embedding dimension that can beapplied faster than current matrix multiplication time, and to obtain anoptimal single-pass algorithm for least squares regression. We further extendour results to construct even sparser non-oblivious embeddings, leading to thefirst subspace embedding with low distortion $\epsilon=o(1)$ and optimalembedding dimension $m=O(d/\epsilon^2)$ that can be applied in current matrixmultiplication time.