A Latent Space Theory for Emergent Abilities in Large Language Models

Abstract

Languages are not created randomly but rather to communicate information.There is a strong association between languages and their underlying meanings,resulting in a sparse joint distribution that is heavily peaked according totheir correlations. Moreover, these peak values happen to match with themarginal distribution of languages due to the sparsity. With the advent of LLMstrained on big data and large models, we can now precisely assess the marginaldistribution of languages, providing a convenient means of exploring the sparsestructures in the joint distribution for effective inferences. In this paper,we categorize languages as either unambiguous or {\epsilon}-ambiguous andpresent quantitative results to demonstrate that the emergent abilities ofLLMs, such as language understanding, in-context learning, chain-of-thoughtprompting, and effective instruction fine-tuning, can all be attributed toBayesian inference on the sparse joint distribution of languages.

Quick Read (beta)

loading the full paper ...