Multiplicative Models for Recurrent Language Modeling

Abstract

Recently, there has been interest in multiplicative recurrent neural networksfor language modeling. Indeed, simple Recurrent Neural Networks (RNNs)encounter difficulties recovering from past mistakes when generating sequencesdue to high correlation between hidden states. These challenges can bemitigated by integrating second-order terms in the hidden-state update. Onesuch model, multiplicative Long Short-Term Memory (mLSTM) is particularlyinteresting in its original formulation because of the sharing of itssecond-order term, referred to as the intermediate state. We explore thesearchitectural improvements by introducing new models and testing them oncharacter-level language modeling tasks. This allows us to establish therelevance of shared parametrization in recurrent language modeling.

Quick Read (beta)

loading the full paper ...