Fast Sparse ConvNets - Paper Detail

Abstract

Historically, the pursuit of efficient inference has been one of the drivingforces behind research into new deep learning architectures and buildingblocks. Some recent examples include: the squeeze-and-excitation module,depthwise separable convolutions in Xception, and the inverted bottleneck inMobileNet v2. Notably, in all of these cases, the resulting building blocksenabled not only higher efficiency, but also higher accuracy, and found wideadoption in the field. In this work, we further expand the arsenal of efficientbuilding blocks for neural network architectures; but instead of combiningstandard primitives (such as convolution), we advocate for the replacement ofthese dense primitives with their sparse counterparts. While the idea of usingsparsity to decrease the parameter count is not new, the conventional wisdom isthat this reduction in theoretical FLOPs does not translate into real-worldefficiency gains. We aim to correct this misconception by introducing a familyof efficient sparse kernels for ARM and WebAssembly, which we open-source forthe benefit of the community as part of the XNNPACK library. Equipped with ourefficient implementation of sparse primitives, we show that sparse versions ofMobileNet v1, MobileNet v2 and EfficientNet architectures substantiallyoutperform strong dense baselines on the efficiency-accuracy curve. OnSnapdragon 835 our sparse networks outperform their dense equivalents by$1.3-2.4\times$ -- equivalent to approximately one entire generation ofMobileNet-family improvement. We hope that our findings will facilitate wideradoption of sparsity as a tool for creating efficient and accurate deeplearning architectures.

Quick Read (beta)

loading the full paper ...