Word Acquisition in Neural Language Models

Abstract

We investigate how neural language models acquire individual words duringtraining, extracting learning curves and ages of acquisition for over 600 wordson the MacArthur-Bates Communicative Development Inventory (Fenson et al.,2007). Drawing on studies of word acquisition in children, we evaluate multiplepredictors for words' ages of acquisition in LSTMs, BERT, and GPT-2. We findthat the effects of concreteness, word length, and lexical class are pointedlydifferent in children and language models, reinforcing the importance ofinteraction and sensorimotor experience in child language acquisition. Languagemodels rely far more on word frequency than children, but like children, theyexhibit slower learning of words in longer utterances. Interestingly, modelsfollow consistent patterns during training for both unidirectional andbidirectional models, and for both LSTM and Transformer architectures. Modelspredict based on unigram token frequencies early in training, beforetransitioning loosely to bigram probabilities, eventually converging on morenuanced predictions. These results shed light on the role of distributionallearning mechanisms in children, while also providing insights for morehuman-like language acquisition in language models.

Quick Read (beta)

loading the full paper ...