Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network

Abstract

Because of their effectiveness in broad practical applications, LSTM networkshave received a wealth of coverage in scientific journals, technical blogs, andimplementation guides. However, in most articles, the inference formulas forthe LSTM network and its parent, RNN, are stated axiomatically, while thetraining formulas are omitted altogether. In addition, the technique of"unrolling" an RNN is routinely presented without justification throughout theliterature. The goal of this paper is to explain the essential RNN and LSTMfundamentals in a single document. Drawing from concepts in signal processing,we formally derive the canonical RNN formulation from differential equations.We then propose and prove a precise statement, which yields the RNN unrollingtechnique. We also review the difficulties with training the standard RNN andaddress them by transforming the RNN into the "Vanilla LSTM" network through aseries of logical arguments. We provide all equations pertaining to the LSTMsystem together with detailed descriptions of its constituent entities. Albeitunconventional, our choice of notation and the method for presenting the LSTMsystem emphasizes ease of understanding. As part of the analysis, we identifynew opportunities to enrich the LSTM system and incorporate these extensionsinto the Vanilla LSTM network, producing the most general LSTM variant to date.The target reader has already been exposed to RNNs and LSTM networks throughnumerous available resources and is open to an alternative pedagogicalapproach. A Machine Learning practitioner seeking guidance for implementing ournew augmented LSTM model in software for experimentation and research will findthe insights and derivations in this tutorial valuable as well.

Quick Read (beta)

loading the full paper ...