Abstract
The architecture and the parameters of neural networks are often optimizedindependently, which requires costly retraining of the parameters whenever thearchitecture is modified. In this work we instead focus on growing thearchitecture without requiring costly retraining. We present a method that addsnew neurons during training without impacting what is already learned, whileimproving the training dynamics. We achieve the latter by maximizing thegradients of the new weights and find the optimal initialization efficiently bymeans of the singular value decomposition (SVD). We call this techniqueGradient Maximizing Growth (GradMax) and demonstrate its effectiveness invariety of vision tasks and architectures.
Quick Read (beta)
loading the full paper ...