How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Abstract

We study how neural networks trained by gradient descent extrapolate, i.e.,what they learn outside the support of training distribution. Previous worksreport mixed empirical results when extrapolating with neural networks: whilemultilayer perceptrons (MLPs) do not extrapolate well in simple tasks, GraphNeural Networks (GNNs), a structured network with MLP modules, have somesuccess in more complex tasks. We provide a theoretical explanation andidentify conditions under which MLPs and GNNs extrapolate well. We start byshowing ReLU MLPs trained by gradient descent converge quickly to linearfunctions along any direction from the origin, which suggests ReLU MLPs cannotextrapolate well in most non-linear tasks. On the other hand, ReLU MLPs canprovably converge to a linear target function when the training distribution is"diverse" enough. These observations lead to a hypothesis: GNNs can extrapolatewell in dynamic programming (DP) tasks if we encode appropriate non-linearityin the architecture and input representation. We provide theoretical andempirical support for the hypothesis. Our theory explains previousextrapolation success and suggest their limitations: successful extrapolationrelies on incorporating task-specific non-linearity, which often requiresdomain knowledge or extensive model search.

Quick Read (beta)

loading the full paper ...