Abstract
We explore the theoretical possibility of learning $d$-dimensional targetswith $W$-parameter models by gradient flow (GF) when $W<d$. Our main resultshows that if the targets are described by a particular $d$-dimensionalprobability distribution, then there exist models with as few as two parametersthat can learn the targets with arbitrarily high success probability. On theother hand, we show that for $W<d$ there is necessarily a large subset ofGF-non-learnable targets. In particular, the set of learnable targets is notdense in $\mathbb R^d$, and any subset of $\mathbb R^d$ homeomorphic to the$W$-dimensional sphere contains non-learnable targets. Finally, we observe thatthe model in our main theorem on almost guaranteed two-parameter learning isconstructed using a hierarchical procedure and as a result is not expressibleby a single elementary function. We show that this limitation is essential inthe sense that most models written in terms of elementary functions cannotachieve the learnability demonstrated in this theorem.