A Probabilistic Generative Model of Linguistic Typology

Abstract

In the principles-and-parameters framework, the structural features oflanguages depend on parameters that may be toggled on or off, with a singleparameter often dictating the status of multiple features. The impliedcovariance between features inspires our probabilisation of this line oflinguistic inquiry---we develop a generative model of language based onexponential-family matrix factorisation. By modelling all languages andfeatures within the same architecture, we show how structural similaritiesbetween languages can be exploited to predict typological features withnear-perfect accuracy, outperforming several baselines on the task ofpredicting held-out features. Furthermore, we show that language embeddingspre-trained on monolingual text allow for generalisation to unobservedlanguages. This finding has clear practical and also theoretical implications:the results confirm what linguists have hypothesised, i.e.~that there aresignificant correlations between typological features and languages.

Quick Read (beta)

loading the full paper ...