An enriched category theory of language: from syntax to semantics

Abstract

State of the art language models return a natural language text continuationfrom any piece of input text. This ability to generate coherent text extensionsimplies significant sophistication, including a knowledge of grammar andsemantics. In this paper, we propose a mathematical framework for passing fromprobability distributions on extensions of given texts, such as the oneslearned by today's large language models, to an enriched category containingsemantic information. Roughly speaking, we model probability distributions ontexts as a category enriched over the unit interval. Objects of this categoryare expressions in language, and hom objects are conditional probabilities thatone expression is an extension of another. This category is syntactical -- itdescribes what goes with what. Then, via the Yoneda embedding, we pass to theenriched category of unit interval-valued copresheaves on this syntacticalcategory. This category of enriched copresheaves is semantic -- it is where wefind meaning, logical operations such as entailment, and the building blocksfor more elaborate semantic concepts.

Quick Read (beta)

loading the full paper ...