Training Neural Networks with Fixed Sparse Masks

Abstract

During typical gradient-based training of deep neural networks, all of themodel's parameters are updated at each iteration. Recent work has shown that itis possible to update only a small subset of the model's parameters duringtraining, which can alleviate storage and communication requirements. In thispaper, we show that it is possible to induce a fixed sparse mask on the model'sparameters that selects a subset to update over many iterations. Our methodconstructs the mask out of the $k$ parameters with the largest Fisherinformation as a simple approximation as to which parameters are most importantfor the task at hand. In experiments on parameter-efficient transfer learningand distributed training, we show that our approach matches or exceeds theperformance of other methods for training with sparse updates while being moreefficient in terms of memory usage and communication costs. We release our codepublicly to promote further applications of our approach.

Quick Read (beta)

loading the full paper ...