What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages

Abstract

What can large language models learn? By definition, language models (LM) aredistributions over strings. Therefore, an intuitive way of addressing the abovequestion is to formalize it as a matter of learnability of classes ofdistributions over strings. While prior work in this direction focused onassessing the theoretical limits, in contrast, we seek to understand theempirical learnability. Unlike prior empirical work, we evaluate neural LMs ontheir home turf-learning probabilistic languages-rather than as classifiers offormal languages. In particular, we investigate the learnability of regular LMs(RLMs) by RNN and Transformer LMs. We empirically test the learnability of RLMsas a function of various complexity parameters of the RLM and the hidden statesize of the neural LM. We find that the RLM rank, which corresponds to the sizeof linear space spanned by the logits of its conditional distributions, and theexpected length of sampled strings are strong and significant predictors oflearnability for both RNNs and Transformers. Several other predictors alsoreach significance, but with differing patterns between RNNs and Transformers.

Quick Read (beta)

loading the full paper ...