Reranking Machine Translation Hypotheses with Structured and Web-based Language Models

Abstract

In this paper, we investigate the use of linguistically motivated andcomputationally efficient structured language models for reranking N-besthypotheses in a statistical machine translation system. These language models,developed from Constraint Dependency Grammar parses, tightly integrateknowledge of words, morphological and lexical features, and syntacticdependency constraints. Two structured language models are applied for N-bestrescoring, one is an almost-parsing language model, and the other utilizes moresyntactic features by explicitly modeling syntactic dependencies between words.We also investigate effective and efficient language modeling methods to useN-grams extracted from up to 1 teraword of web documents. We apply all theselanguage models for N-best re-ranking on the NIST and DARPA GALE program 2006and 2007 machine translation evaluation tasks and find that the combination ofthese language models increases the BLEU score up to 1.6% absolutely on blindtest sets.

Quick Read (beta)

loading the full paper ...