How to train RNNs on chaotic data?

Abstract

Recurrent neural networks (RNNs) are wide-spread machine learning tools formodeling sequential and time series data. They are notoriously hard to trainbecause their loss gradients backpropagated in time tend to saturate or divergeduring training. This is known as the exploding and vanishing gradient problem.Previous solutions to this issue either built on rather complicated,purpose-engineered architectures with gated memory buffers, or - more recently- imposed constraints that ensure convergence to a fixed point or restrict (theeigenspectrum of) the recurrence matrix. Such constraints, however, conveysevere limitations on the expressivity of the RNN. Essential intrinsic dynamicssuch as multistability or chaos are disabled. This is inherently at disaccordwith the chaotic nature of many, if not most, time series encountered in natureand society. Here we offer a comprehensive theoretical treatment of thisproblem by relating the loss gradients during RNN training to the Lyapunovspectrum of RNN-generated orbits. We mathematically prove that RNNs producingstable equilibrium or cyclic behavior have bounded gradients, whereas thegradients of RNNs with chaotic dynamics always diverge. Based on these analysesand insights, we offer an effective yet simple training technique for chaoticdata and guidance on how to choose relevant hyperparameters according to theLyapunov spectrum.

Quick Read (beta)

loading the full paper ...