Learnability of high-dimensional targets by two-parameter models and gradient flow

Abstract

We explore the theoretical possibility of learning $d$-dimensional targetswith $W$-parameter models by gradient flow (GF) when $W<d$. Our main resultshows that if the targets are described by a particular $d$-dimensionalprobability distribution, then there exist models with as few as two parametersthat can learn the targets with arbitrarily high success probability. On theother hand, we show that for $W<d$ there is necessarily a large subset ofGF-non-learnable targets. In particular, the set of learnable targets is notdense in $\mathbb R^d$, and any subset of $\mathbb R^d$ homeomorphic to the$W$-dimensional sphere contains non-learnable targets. Finally, we observe thatthe model in our main theorem on almost guaranteed two-parameter learning isconstructed using a hierarchical procedure and as a result is not expressibleby a single elementary function. We show that this limitation is essential inthe sense that most models written in terms of elementary functions cannotachieve the learnability demonstrated in this theorem.

Quick Read (beta)

loading the full paper ...