Graph Kernel Attention Transformers

Abstract

We introduce a new class of graph neural networks (GNNs), by combiningseveral concepts that were so far studied independently - graph kernels,attention-based networks with structural priors and more recently, efficientTransformers architectures applying small memory footprint implicit attentionmethods via low rank decomposition techniques. The goal of the paper istwofold. Proposed by us Graph Kernel Attention Transformers (or GKATs) are muchmore expressive than SOTA GNNs as capable of modeling longer-range dependencieswithin a single layer. Consequently, they can use more shallow architecturedesign. Furthermore, GKAT attention layers scale linearly rather thanquadratically in the number of nodes of the input graphs, even when thosegraphs are dense, requiring less compute than their regular graph attentioncounterparts. They achieve it by applying new classes of graph kernelsadmitting random feature map decomposition via random walks on graphs. As abyproduct of the introduced techniques, we obtain a new class of learnablegraph sketches, called graphots, compactly encoding topological graphproperties as well as nodes' features. We conducted exhaustive empiricalcomparison of our method with nine different GNN classes on tasks ranging frommotif detection through social network classification to bioinformaticschallenges, showing consistent gains coming from GKATs.

Quick Read (beta)

loading the full paper ...