Long-Range Correlation Underlying Childhood Language and Generative Models

  • 2017-12-11 04:48:43
  • Kumiko Tanaka-Ishii
  • 2


Long-range correlation, a property of time series exhibiting long-termmemory, is mainly studied in the statistical physics domain and has beenreported to exist in natural language. Using a state-of-the-art method for suchanalysis, long-range correlation is first shown to occur in long CHILDES datasets. To understand why, Bayesian generative models of language, originallyproposed in the cognitive scientific domain, are investigated. Amongrepresentative models, the Simon model was found to exhibit surprisingly goodlong-range correlation, but not the Pitman-Yor model. Since the Simon model isknown not to correctly reflect the vocabulary growth of natural language, asimple new model is devised as a conjunct of the Simon and Pitman-Yor models,such that long-range correlation holds with a correct vocabulary growth rate.The investigation overall suggests that uniform sampling is one cause oflong-range correlation and could thus have a relation with actual linguisticprocesses.


Quick Read (beta)

loading the full paper ...