Quantifying the dynamics of topical fluctuations in language

  • 2018-06-13 16:26:54
  • Andres Karjus, Richard A. Blythe, Simon Kirby, Kenny Smith
  • 0

Abstract

The availability of large diachronic corpora has provided the impetus for agrowing body of quantitative research on language evolution and meaning change.The central quantities in this research are token frequencies of linguisticelements in the texts, with changes in frequency taken to reflect thepopularity or selective fitness of an element. However, corpus frequencies maychange for a wide variety of reasons, including purely random sampling effects,or because corpora are composed of contemporary media and fiction texts withinwhich the underlying topics ebb and flow with cultural and socio-politicaltrends. In this work, we introduce a computationally simple model forcontrolling for topical fluctuations in corpora - the topical-culturaladvection model - and demonstrate how it provides a robust baseline ofvariability in word frequency changes over time. We validate the model on adiachronic corpus spanning two centuries, and a carefully-controlled artificiallanguage change scenario, and then use it to correct for topical fluctuationsin historical time series. Finally, we show that the model can be used to showthat emergence of new words typically corresponds with the rise of a trendingtopic. This suggests that some lexical innovations occur due to growingcommunicative need in a subspace of the lexicon, and that the topical-culturaladvection model can be used to quantify this.

 

Quick Read (beta)

loading the full paper ...