Skip-Convolutions for Efficient Video Processing

Abstract

We propose Skip-Convolutions to leverage the large amount of redundancies invideo streams and save computations. Each video is represented as a series ofchanges across frames and network activations, denoted as residuals. Wereformulate standard convolution to be efficiently computed on residual frames:each layer is coupled with a binary gate deciding whether a residual isimportant to the model prediction,~\eg foreground regions, or it can be safelyskipped, e.g. background regions. These gates can either be implemented as anefficient network trained jointly with convolution kernels, or can simply skipthe residuals based on their magnitude. Gating functions can also incorporateblock-wise sparsity structures, as required for efficient implementation onhardware platforms. By replacing all convolutions with Skip-Convolutions in twostate-of-the-art architectures, namely EfficientDet and HRNet, we reduce theircomputational cost consistently by a factor of 3~4x for two different tasks,without any accuracy drop. Extensive comparisons with existing modelcompression, as well as image and video efficiency methods demonstrate thatSkip-Convolutions set a new state-of-the-art by effectively exploiting thetemporal redundancies in videos.

Quick Read (beta)

loading the full paper ...