A Mathematical Model for Linguistic Universals

Abstract

Inspired by chemical kinetics and neurobiology, we propose a mathematicaltheory for pattern recurrence in text documents, applicable to a wide varietyof languages. We present a Markov model at the discourse level for StevenPinker's "mentalese", or chains of mental states that transcend thespoken/written forms. Such (potentially) universal temporal structures oftextual patterns lead us to a language-independent semantic representation, ora translationally-invariant word embedding, thereby forming the common groundfor both comprehensibility within a given language and translatability betweendifferent languages. Applying our model to documents of moderate lengths,without relying on external knowledge bases, we reconcile Noam Chomsky's"poverty of stimulus" paradox with statistical learning of natural languages.

Quick Read (beta)

loading the full paper ...