Abstract
One of the central goals of Recurrent Neural Networks (RNNs) is to learnlong-term dependencies in sequential data. Nevertheless, the most populartraining method, Truncated Backpropagation through Time (TBPTT), categoricallyforbids learning dependencies beyond the truncation horizon. In contrast, theonline training algorithm Real Time Recurrent Learning (RTRL) providesuntruncated gradients, with the disadvantage of impractically largecomputational costs. Recently published approaches reduce these costs byproviding noisy approximations of RTRL. We present a new approximationalgorithm of RTRL, Optimal Kronecker-Sum Approximation (OK). We prove that OKis optimal for a class of approximations of RTRL, which includes all approachespublished so far. Additionally, we show that OK has empirically negligiblenoise: Unlike previous algorithms it matches TBPTT in a real world task(character-level Penn TreeBank) and can exploit online parameter updates tooutperform TBPTT in a synthetic string memorization task.