Better Document-Level Machine Translation with Bayes' Rule

Abstract

We show that Bayes' rule provides an effective mechanism for creatingdocument translation models that can be learned from only parallel sentencesand monolingual documents---a compelling benefit as parallel documents are notalways available. In our formulation, the posterior probability of a candidatetranslation is the product of the unconditional (prior) probability of thecandidate output document and the "reverse translation probability" oftranslating the candidate output back into the source language. Our proposedmodel uses a powerful autoregressive language model as the prior on targetlanguage documents, but it assumes that each sentence is translatedindependently from the target to the source language. Crucially, at test time,when a source document is observed, the document language model prior inducesdependencies between the translations of the source sentences in the posterior.The model's independence assumption not only enables efficient use of availabledata, but it additionally admits a practical left-to-right beam-searchalgorithm for carrying out inference. Experiments show that our model benefitsfrom using cross-sentence context in the language model, and it outperformsexisting document translation approaches.

Quick Read (beta)

loading the full paper ...