Evaluating the Ability of LSTMs to Learn Context-Free Grammars

Abstract

While long short-term memory (LSTM) neural net architectures are designed tocapture sequence information, human language is generally composed ofhierarchical structures. This raises the question as to whether LSTMs can learnhierarchical structures. We explore this question with a well-formed bracketprediction task using two types of brackets modeled by an LSTM. Demonstratingthat such a system is learnable by an LSTM is the first step in demonstratingthat the entire class of CFLs is also learnable. We observe that the modelrequires exponential memory in terms of the number of characters and embeddeddepth, where a sub-linear memory should suffice. Still, the model does morethan memorize the training input. It learns how to distinguish between relevantand irrelevant information. On the other hand, we also observe that the modeldoes not generalize well. We conclude that LSTMs do not learn the relevantunderlying context-free rules, suggesting the good overall performance isattained rather by an efficient way of evaluating nuisance variables. LSTMs area way to quickly reach good results for many natural language tasks, but tounderstand and generate natural language one has to investigate other conceptsthat can make more direct use of natural language's structural nature.

Quick Read (beta)

loading the full paper ...