Abstract
The uniform information density (UID) hypothesis, which posits that speakersbehaving optimally tend to distribute information uniformly across a linguisticsignal, has gained traction in psycholinguistics as an explanation for certainsyntactic, morphological, and prosodic choices. In this work, we explorewhether the UID hypothesis can be operationalized as an inductive bias forstatistical language modeling. Specifically, we augment the canonical MLEobjective for training language models with a regularizer that encodes UID. Inexperiments on ten languages spanning five language families, we find thatusing UID regularization consistently improves perplexity in language models,having a larger effect when training data is limited. Moreover, via an analysisof generated sequences, we find that UID-regularized language models have otherdesirable properties, e.g., they generate text that is more lexically diverse.Our results not only suggest that UID is a reasonable inductive bias forlanguage modeling, but also provide an alternative validation of the UIDhypothesis using modern-day NLP tools.