Old BERT, New Tricks: Artificial Language Learning for Pre-Trained Language Models

Abstract

We extend the artificial language learning experimental paradigm frompsycholinguistics and apply it to pre-trained language models -- specifically,BERT (Devlin et al., 2019). We treat the model as a subject in an artificiallanguage learning experimental setting: in order to learn the relation betweentwo linguistic properties A and B, we introduce a set of new, non-existent,linguistic items, give the model information about their variation alongproperty A, then measure to what extent the model learns property B for theseitems as a result of training. We show this method at work for degree modifiers(expressions like "slightly", "very", "rather", "extremely") and test thehypothesis that the degree expressed by modifiers (low, medium or high degree)is related to their sensitivity to sentence polarity (whether they showpreference for affirmative or negative sentences or neither). Our experimentalresults are compatible with existing linguistic observations that relate degreesemantics to polarity-sensitivity, including the main one: low degree semanticsleads to positive polarity sensitivity (that is, to preference towardsaffirmative contexts). The method can be used in linguistics to elaborate onhypotheses and interpret experimental results, as well as for more insightfulevaluation of linguistic representations in language models.

Quick Read (beta)

loading the full paper ...