When big data actually are low-rank, or entrywise approximation of certain function-generated matrices

Abstract

The article concerns low-rank approximation of matrices generated by samplinga smooth function of two $m$-dimensional variables. We refute an argument madein the literature that, for a specific class of analytic functions, suchmatrices admit accurate entrywise approximation of rank that is independent of$m$. We provide a theoretical explanation of the numerical results presented insupport of this argument, describing three narrower classes of functions forwhich $n \times n$ function-generated matrices can be approximated within anentrywise error of order $\varepsilon$ with rank $\mathcal{O}(\log(n)\varepsilon^{-2} \mathrm{polylog}(\varepsilon^{-1}))$ that is independent ofthe dimension $m$: (i) functions of the inner product of the two variables,(ii) functions of the squared Euclidean distance between the variables, and(iii) shift-invariant positive-definite kernels. We extend our argument tolow-rank tensor-train approximation of tensors generated with functions of themulti-linear product of their $m$-dimensional variables. We discuss our resultsin the context of low-rank approximation of attention in transformer neuralnetworks.

Quick Read (beta)

loading the full paper ...