Network Pruning via Transformable Architecture Search

Abstract

Network pruning reduces the computation costs of an over-parameterizednetwork without performance damage. Prevailing pruning algorithms pre-definethe width and depth of the pruned networks, and then transfer parameters fromthe unpruned network to pruned networks. To break the structure limitation ofthe pruned networks, we propose to apply neural architecture search to searchdirectly for a network with flexible channel and layer sizes. The number of thechannels/layers is learned by minimizing the loss of the pruned networks. Thefeature map of the pruned network is an aggregation of K feature map fragments(generated by K networks of different sizes), which are sampled based on theprobability distribution.The loss can be back-propagated not only to thenetwork weights, but also to the parameterized distribution to explicitly tunethe size of the channels/layers. Specifically, we apply channel-wiseinterpolation to keep the feature map with different channel sizes aligned inthe aggregation procedure. The maximum probability for the size in eachdistribution serves as the width and depth of the pruned network, whoseparameters are learned by knowledge transfer, e.g., knowledge distillation,from the original networks. Experiments on CIFAR-10, CIFAR-100 and ImageNetdemonstrate the effectiveness of our new perspective of network pruningcompared to traditional network pruning algorithms. Various searching andknowledge transfer approaches are conducted to show the effectiveness of thetwo components. Code is at: https://github.com/D-X-Y/NAS-Projects.

Quick Read (beta)

loading the full paper ...