Exploring Low Rank Training of Deep Neural Networks

Abstract

Training deep neural networks in low rank, i.e. with factorised layers, is ofparticular interest to the community: it offers efficiency over unfactorisedtraining in terms of both memory consumption and training time. Prior work hasfocused on low rank approximations of pre-trained networks and training in lowrank space with additional objectives, offering various ad hoc explanations forchosen practice. We analyse techniques that work well in practice, and throughextensive ablations on models such as GPT2 we provide evidence falsifyingcommon beliefs in the field, hinting in the process at exciting researchopportunities that still need answering.

Quick Read (beta)

loading the full paper ...