Examining the Tip of the Iceberg: A Data Set for Idiom Translation

  • 2018-02-13 15:25:21
  • Marzieh Fadaee, Arianna Bisazza, Christof Monz
  • 0

Abstract

Neural Machine Translation (NMT) has been widely used in recent years withsignificant improvements for many language pairs. Although state-of-the-art NMTsystems are generating progressively better translations, idiom translationremains one of the open challenges in this field. Idioms, a category ofmultiword expressions, are an interesting language phenomenon where the overallmeaning of the expression cannot be composed from the meanings of its parts. Afirst important challenge is the lack of dedicated data sets for learning andevaluating idiom translation. In this paper we address this problem by creatingthe first large-scale data set for idiom translation. Our data set isautomatically extracted from a widely used German-English translation corpusand includes, for each language direction, a targeted evaluation set where allsentences contain idioms and a regular training corpus where sentencesincluding idioms are marked. We release this data set and use it to performpreliminary NMT experiments as the first step towards better idiom translation.

 

Quick Read (beta)

loading the full paper ...