Abstract
In this paper, we introduce the Butterfly Transform (BFT), a light weightchannel fusion method that reduces the computational complexity of point-wiseconvolutions from O(n^2) of conventional solutions to O(n log n) with respectto the number of channels while improving the accuracy of the networks underthe same range of FLOPs. The proposed BFT generalizes the Discrete FourierTransform in a way that its parameters are learned at training time. Ourexperimental evaluations show that replacing channel fusion modules with \sysresults in significant accuracy gains at similar FLOPs across a wide range ofnetwork architectures. For example, replacing channel fusion convolutions withBFT offers 3% absolute top-1 improvement for MobileNetV1-0.25 and 2.5% forShuffleNet V2-0.5 while maintaining the same number of FLOPS. Notably, theShuffleNet-V2+BFT outperforms state-of-the-art architecture search methodsMNasNet \cite{tan2018mnasnet} and FBNet \cite{wu2018fbnet}. We also show thatthe structure imposed by BFT has interesting properties that ensures theefficacy of the resulting network.