EEL: Efficiently Encoding Lattices for Reranking

Abstract

Standard decoding approaches for conditional text generation tasks typicallysearch for an output hypothesis with high model probability, but this may notyield the best hypothesis according to human judgments of quality. Reranking tooptimize for "downstream" metrics can better optimize for quality, but manymetrics of interest are computed with pre-trained language models, which areslow to apply to large numbers of hypotheses. We explore an approach forreranking hypotheses by using Transformers to efficiently encode lattices ofgenerated outputs, a method we call EEL. With a single Transformer pass overthe entire lattice, we can approximately compute a contextualizedrepresentation of each token as if it were only part of a single hypothesis inisolation. We combine this approach with a new class of token-factoredrerankers (TFRs) that allow for efficient extraction of high reranker-scoringhypotheses from the lattice. Empirically, our approach incurs minimaldegradation error compared to the exponentially slower approach of encodingeach hypothesis individually. When applying EEL with TFRs across three textgeneration tasks, our results show both substantial speedup compared to naivereranking and often better performance on downstream metrics than comparableapproaches.

Quick Read (beta)

loading the full paper ...