On the Provable Generalization of Recurrent Neural Networks

Abstract

Recurrent Neural Network (RNN) is a fundamental structure in deep learning.Recently, some works study the training process of over-parameterized neuralnetworks, and show that over-parameterized networks can learn functions in somenotable concept classes with a provable generalization error bound. In thispaper, we analyze the training and generalization for RNNs with randominitialization, and provide the following improvements over recent works: 1) For a RNN with input sequence $x=(X_1,X_2,...,X_L)$, previous works studyto learn functions that are summation of $f(\beta^T_lX_l)$ and requirenormalized conditions that $||X_l||\leq\epsilon$ with some very small$\epsilon$ depending on the complexity of $f$. In this paper, using detailedanalysis about the neural tangent kernel matrix, we prove a generalizationerror bound to learn such functions without normalized conditions and show thatsome notable concept classes are learnable with the numbers of iterations andsamples scaling almost-polynomially in the input length $L$. 2) Moreover, we prove a novel result to learn N-variables functions of inputsequence with the form $f(\beta^T[X_{l_1},...,X_{l_N}])$, which do not belongto the "additive" concept class, i,e., the summation of function $f(X_l)$. Andwe show that when either $N$ or $l_0=\max(l_1,..,l_N)-\min(l_1,..,l_N)$ issmall, $f(\beta^T[X_{l_1},...,X_{l_N}])$ will be learnable with the numberiterations and samples scaling almost-polynomially in the input length $L$.

Quick Read (beta)

loading the full paper ...