Syntactic Language Change in English and German: Metrics, Parsers, and Convergences

  • 2024-03-28 12:16:28
  • Yanran Chen, Wei Zhao, Anne Breitbarth, Manuel Stoeckel, Alexander Mehler, Steffen Eger
Many studies have shown that human languages tend to optimize for lowercomplexity and increased communication efficiency. Syntactic dependencydistance, which measures the linear distance between dependent words, is oftenconsidered a key indicator of language processing difficulty and working memoryload. The current paper looks at diachronic trends in syntactic language changein both English and German, using corpora of parliamentary debates from thelast c. 160 years. We base our observations on five dependency parsers,including the widely used Stanford CoreNLP as well as 4 newer alternatives. Ouranalysis of syntactic language change goes beyond linear dependency distanceand explores 15 metrics relevant to dependency distance minimization (DDM)and/or based on tree graph properties, such as the tree height and degreevariance. Even though we have evidence that recent parsers trained on moderntreebanks are not heavily affected by data 'noise' such as spelling changes andOCR errors in our historic data, we find that results of syntactic languagechange are sensitive to the parsers involved, which is a caution against usinga single parser for evaluating syntactic language change as done in previouswork. We also show that syntactic language change over the time periodinvestigated is largely similar between English and German for the differentmetrics explored: only 4% of cases we examine yield opposite conclusionsregarding upwards and downtrends of syntactic metrics across German andEnglish. We also show that changes in syntactic measures seem to be morefrequent at the tails of sentence length distributions. To our best knowledge,ours is the most comprehensive analysis of syntactic language change usingmodern NLP technology in recent corpora of English and German.


