Abstract
The Transformer architecture has become a dominant choice in many domains,such as natural language processing and computer vision. Yet, it has notachieved competitive performance on popular leaderboards of graph-levelprediction compared to mainstream GNN variants. Therefore, it remains a mysteryhow Transformers could perform well for graph representation learning. In thispaper, we solve this mystery by presenting Graphormer, which is built upon thestandard Transformer architecture, and could attain excellent results on abroad range of graph representation learning tasks, especially on the recentOGB Large-Scale Challenge. Our key insight to utilizing Transformer in thegraph is the necessity of effectively encoding the structural information of agraph into the model. To this end, we propose several simple yet effectivestructural encoding methods to help Graphormer better model graph-structureddata. Besides, we mathematically characterize the expressive power ofGraphormer and exhibit that with our ways of encoding the structuralinformation of graphs, many popular GNN variants could be covered as thespecial cases of Graphormer.