How Attentive are Graph Attention Networks?

Abstract

Graph Attention Networks (GATs) are one of the most popular GNN architecturesand are considered as the state-of-the-art architecture for representationlearning with graphs. In GAT, every node attends to its neighbors given its ownrepresentation as the query. However, in this paper we show that GATs can onlycompute a restricted kind of attention where the ranking of attended nodes isunconditioned on the query node. We formally define this restricted kind ofattention as static attention and distinguish it from a strictly moreexpressive dynamic attention. Because GATs use a static attention mechanism,there are simple graph problems that GAT cannot express: in a controlledproblem, we show that static attention hinders GAT from even fitting thetraining data. To remove this limitation, we introduce a simple fix bymodifying the order of operations and propose GATv2: a dynamic graph attentionvariant that is strictly more expressive than GAT. We perform an extensiveevaluation and show that GATv2 outperforms GAT across 11 OGB and otherbenchmarks while we match their parametric costs. Our code is available athttps://github.com/tech-srl/how_attentive_are_gats .

Quick Read (beta)

loading the full paper ...