### Abstract

We prove that the gradient descent training of a two-layer neural network onempirical or population risk may not decrease population risk at an orderfaster than $t^{-4/(d-2)}$ under mean field scaling. Thus gradient descenttraining for fitting reasonably smooth, but truly high-dimensional data may besubject to the curse of dimensionality. We present numerical evidence thatgradient descent training with general Lipschitz target functions becomesslower and slower as the dimension increases, but converges at approximatelythe same rate in all dimensions when the target function lies in the naturalfunction space for two-layer ReLU networks.

### Quick Read (beta)

loading the full paper ...