From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition

Abstract

We examine the language capabilities of language models (LMs) from thecritical perspective of human language acquisition. Building on classicallanguage development theories, we propose a three-stage framework to assess theabilities of LMs, ranging from preliminary word understanding to complexgrammar and complex logical reasoning. Using this framework, we evaluate thegenerative capacities of LMs using methods from linguistic research. Resultsindicate that although recent LMs outperform earlier models in overallperformance, their developmental trajectory does not strictly follow the pathof human language acquisition. Notably, in generation tasks, LMs are moresimilar to human performance in areas where information is easier to extractfrom the corpus, such as average word length, clauses, and auxiliary verbs.Newer LMs did not exhibit significant progress in terms of specific dimensions,such as clauses and auxiliary verbs, where the variation across corpora isrelatively limited. Register theory offers a plausible explanation for theseobservations, suggesting that the linguistic features of the training data havea substantial impact on the models' abilities.

Quick Read (beta)

loading the full paper ...