Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information

Abstract

How do neural language models keep track of number agreement between subjectand verb? We show that `diagnostic classifiers', trained to predict number fromthe internal states of a language model, provide a detailed understanding ofhow, when, and where this information is represented. Moreover, they give usinsight into when and where number information is corrupted in cases where thelanguage model ends up making agreement errors. To demonstrate the causal roleplayed by the representations we find, we then use agreement information toinfluence the course of the LSTM during the processing of difficult sentences.Results from such an intervention reveal a large increase in the languagemodel's accuracy. Together, these results show that diagnostic classifiers giveus an unrivalled detailed look into the representation of linguisticinformation in neural models, and demonstrate that this knowledge can be usedto improve their performance.

Quick Read (beta)

loading the full paper ...