Local and Global Decoding in Text Generation

Abstract

Text generation, a key component in applications such as dialogue systems,relies on decoding algorithms that sample strings from a language modeldistribution. Traditional methods, such as top-$k$ and top-$\pi$, apply localnormalisation to the model's output distribution, which can distort it. In thispaper, we investigate the effect of this distortion by introducingglobally-normalised versions of these decoding methods. Additionally, wepropose an independent Metropolis-Hastings algorithm to approximate samplingfrom globally-normalised distributions without explicitly computing them. Ourempirical analysis compares the performance of local and global normalisationacross two decoding algorithms (top-$k$ and top-$\pi$) with varioushyperparameters, using Pythia language models. Results show that, in mostconfigurations, global decoding performs worse than the local decoding versionof the same algorithms -- despite preserving the distribution's integrity. Ourresults suggest that distortion is an important feature of local decodingalgorithms.

Quick Read (beta)

loading the full paper ...