A Generalization of Transformer Networks to Graphs

Abstract

We propose a generalization of transformer neural network architecture forarbitrary graphs. The original transformer was designed for Natural LanguageProcessing (NLP), which operates on fully connected graphs representing allconnections between the words in a sequence. Such architecture does notleverage the graph connectivity inductive bias, and can perform poorly when thegraph topology is important and has not been encoded into the node features. Weintroduce a graph transformer with four new properties compared to the standardmodel. First, the attention mechanism is a function of the neighborhoodconnectivity for each node in the graph. Second, the positional encoding isrepresented by the Laplacian eigenvectors, which naturally generalize thesinusoidal positional encodings often used in NLP. Third, the layernormalization is replaced by a batch normalization layer, which provides fastertraining and better generalization performance. Finally, the architecture isextended to edge feature representation, which can be critical to tasks s.a.chemistry (bond type) or link prediction (entity relationship in knowledgegraphs). Numerical experiments on a graph benchmark demonstrate the performanceof the proposed graph transformer architecture. This work closes the gapbetween the original transformer, which was designed for the limited case ofline graphs, and graph neural networks, that can work with arbitrary graphs. Asour architecture is simple and generic, we believe it can be used as a blackbox for future applications that wish to consider transformer and graphs.

Quick Read (beta)

loading the full paper ...