Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning

Abstract

Pruning of deep neural networks has been an effective technique for reducingmodel size while preserving most of the performance of dense networks, crucialfor deploying models on memory and power-constrained devices. While recentsparse learning methods have shown promising performance up to moderatesparsity levels such as 95% and 98%, accuracy quickly deteriorates when pushingsparsities to extreme levels. Obtaining sparse networks at such extremesparsity levels presents unique challenges, such as fragile gradient flow andheightened risk of layer collapse. In this work, we explore network performancebeyond the commonly studied sparsities, and propose a collection of techniquesthat enable the continuous learning of networks without accuracy collapse evenat extreme sparsities, including 99.90%, 99.95% and 99.99% on ResNetarchitectures. Our approach combines 1) Dynamic ReLU phasing, where DyReLUinitially allows for richer parameter exploration before being graduallyreplaced by standard ReLU, 2) weight sharing which reuses parameters within aresidual layer while maintaining the same number of learnable parameters, and3) cyclic sparsity, where both sparsity levels and sparsity patterns evolvedynamically throughout training to better encourage parameter exploration. Weevaluate our method, which we term Extreme Adaptive Sparse Training (EAST) atextreme sparsities using ResNet-34 and ResNet-50 on CIFAR-10, CIFAR-100, andImageNet, achieving significant performance improvements over state-of-the-artmethods we compared with.

Quick Read (beta)

loading the full paper ...