Pitfalls of Graph Neural Network Evaluation

Abstract

Semi-supervised node classification in graphs is a fundamental problem ingraph mining, and the recently proposed graph neural networks (GNNs) haveachieved unparalleled results on this task. Due to their massive success, GNNshave attracted a lot of attention, and many novel architectures have been putforward. In this paper we show that existing evaluation strategies for GNNmodels have serious shortcomings. We show that using the sametrain/validation/test splits of the same datasets, as well as makingsignificant changes to the training procedure (e.g. early stopping criteria)precludes a fair comparison of different architectures. We perform a thoroughempirical evaluation of four prominent GNN models and show that consideringdifferent splits of the data leads to dramatically different rankings ofmodels. Even more importantly, our findings suggest that simpler GNNarchitectures are able to outperform the more sophisticated ones if thehyperparameters and the training procedure are tuned fairly for all models.

Quick Read (beta)

loading the full paper ...