Abstract
We present a theoretically well-founded deep learning algorithm fornonparametric regression. It uses over-parametrized deep neural networks withlogistic activation function, which are fitted to the given data via gradientdescent. We propose a special topology of these networks, a special randominitialization of the weights, and a data-dependent choice of the learning rateand the number of gradient descent steps. We prove a theoretical bound on theexpected $L_2$ error of this estimate, and illustrate its finite sample sizeperformance by applying it to simulated data. Our results show that atheoretical analysis of deep learning which takes into account simultaneouslyoptimization, generalization and approximation can result in a new deeplearning estimate which has an improved finite sample performance.