Providing theoretical learning guarantees to Deep Learning Networks

Abstract

Deep Learning (DL) is one of the most common subjects when Machine Learningand Data Science approaches are considered. There are clearly two movementsrelated to DL: the first aggregates researchers in quest to outperform otheralgorithms from literature, trying to win contests by considering often smalldecreases in the empirical risk; and the second investigates overfittingevidences, questioning the learning capabilities of DL classifiers. Motivatedby such opposed points of view, this paper employs the Statistical LearningTheory (SLT) to study the convergence of Deep Neural Networks, with particularinterest in Convolutional Neural Networks. In order to draw theoreticalconclusions, we propose an approach to estimate the Shattering coefficient ofthose classification algorithms, providing a lower bound for the complexity oftheir space of admissible functions, a.k.a. algorithm bias. Based on suchestimator, we generalize the complexity of network biases, and, next, we studyAlexNet and VGG16 architectures in the point of view of their Shatteringcoefficients, and number of training examples required to provide theoreticallearning guarantees. From our theoretical formulation, we show the conditionswhich Deep Neural Networks learn as well as point out another issue: DLbenchmarks may be strictly driven by empirical risks, disregarding thecomplexity of algorithms biases.

Quick Read (beta)

loading the full paper ...