Hebbian learning the local structure of language

Abstract

Learning in the brain is local and unsupervised (Hebbian). We derive thefoundations of an effective human language model inspired by these microscopicconstraints. It has two parts: (1) a hierarchy of neurons which learns totokenize words from text (whichiswhatyoudowhenyoureadthis); and (2) additionalneurons which bind the learned symanticless patterns of the tokenizer into asymanticful token (an embedding). The model permits continuous parallellearning without forgetting; and is a powerful tokenizer which performsrenormalization group. This allows it to exploit redundancy, such that itgenerates tokens which are always decomposable into a basis set (e.g analphabet), and can mix features learned from multiple languages. We find thatthe structure of this model allows it to learn a natural language morphologyWITHOUT data. The language data generated by this model predicts the correctdistribution of word-forming patterns observed in real languages, and furtherdemonstrates why microscopically human speech is broken up into words. Thismodel provides the basis for understanding the microscopic origins of languageand human creativity.

Quick Read (beta)

loading the full paper ...