Abstract
We describe the emergence of a Convolution Bottleneck (CBN) structure inCNNs, where the network uses its first few layers to transform the inputrepresentation into a representation that is supported only along a fewfrequencies and channels, before using the last few layers to map back to theoutputs. We define the CBN rank, which describes the number and type offrequencies that are kept inside the bottleneck, and partially prove that theparameter norm required to represent a function $f$ scales as depth times theCBN rank $f$. We also show that the parameter norm depends at next order on theregularity of $f$. We show that any network with almost optimal parameter normwill exhibit a CBN structure in both the weights and - under the assumptionthat the network is stable under large learning rate - the activations, whichmotivates the common practice of down-sampling; and we verify that the CBNresults still hold with down-sampling. Finally we use the CBN structure tointerpret the functions learned by CNNs on a number of tasks.