A Probabilistic Generative Model of Linguistic Typology

  • 2019-04-09 14:34:57
  • Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein
In the principles-and-parameters framework, the structural features oflanguages depend on parameters that may be toggled on or off, with a singleparameter often dictating the status of multiple features. The impliedcovariance between features inspires our probabilisation of this line oflinguistic inquiry---we develop a generative model of language based onexponential-family matrix factorisation. By modelling all languages andfeatures within the same architecture, we show how structural similaritiesbetween languages can be exploited to predict typological features withnear-perfect accuracy, outperforming several baselines on the task ofpredicting held-out features. Furthermore, we show that language embeddingspre-trained on monolingual text allow for generalisation to unobservedlanguages. This finding has clear practical and also theoretical implications:the results confirm what linguists have hypothesised, i.e.~that there aresignificant correlations between typological features and languages.


