On Evaluating the Generalization of LSTM Models in Formal Languages

Abstract

Recurrent Neural Networks (RNNs) are theoretically Turing-complete andestablished themselves as a dominant model for language processing. Yet, therestill remains an uncertainty regarding their language learning capabilities. Inthis paper, we empirically evaluate the inductive learning capabilities of LongShort-Term Memory networks, a popular extension of simple RNNs, to learn simpleformal languages, in particular $a^nb^n$, $a^nb^nc^n$, and $a^nb^nc^nd^n$. Weinvestigate the influence of various aspects of learning, such as training dataregimes and model capacity, on the generalization to unobserved samples. Wefind striking differences in model performances under different trainingsettings and highlight the need for careful analysis and assessment when makingclaims about the learning capabilities of neural network models.

Quick Read (beta)

loading the full paper ...