Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

Abstract

LSTMs were introduced to combat vanishing gradients in simple RNNs byaugmenting them with gated additive recurrent connections. We present analternative view to explain the success of LSTMs: the gates themselves areversatile recurrent models that provide more representational power thanpreviously appreciated. We do this by decoupling the LSTM's gates from theembedded simple RNN, producing a new class of RNNs where the recurrencecomputes an element-wise weighted sum of context-independent functions of theinput. Ablations on a range of problems demonstrate that the gating mechanismalone performs as well as an LSTM in most settings, strongly suggesting thatthe gates are doing much more in practice than just alleviating vanishinggradients.

Quick Read (beta)

loading the full paper ...