Convolution, attention and structure embedding

Abstract

Deep neural networks are composed of layers of parametrised linear operationsintertwined with non linear activations. In basic models, such as themulti-layer perceptron, a linear layer operates on a simple input vectorembedding of the instance being processed, and produces an output vectorembedding by straight multiplication by a matrix parameter. In more complexmodels, the input and output are structured and their embeddings are higherorder tensors. The parameter of each linear operation must then be controlledso as not to explode with the complexity of the structures involved. This isessentially the role of convolution models, which exist in many flavoursdependent on the type of structure they deal with (grids, networks, time seriesetc.). We present here a unified framework which aims at capturing the essenceof these diverse models, allowing a systematic analysis of their properties andtheir mutual enrichment. We also show that attention models naturally fit inthe same framework: attention is convolution in which the structure itself isadaptive, and learnt, instead of being given a priori.

Quick Read (beta)

loading the full paper ...