Selecting Machine-Translated Data for Quick Bootstrapping of a Natural Language Understanding System

  • 2018-05-23 13:20:16
  • Judith Gaspers, Penny Karanasou, Rajen Chatterjee
  • 3

Abstract

This paper investigates the use of Machine Translation (MT) to bootstrap aNatural Language Understanding (NLU) system for a new language for the use caseof a large-scale voice-controlled device. The goal is to decrease the cost andtime needed to get an annotated corpus for the new language, while still havinga large enough coverage of user requests. Different methods of filtering MTdata in order to keep utterances that improve NLU performance andlanguage-specific post-processing methods are investigated. These methods aretested in a large-scale NLU task with translating around 10 millions trainingutterances from English to German. The results show a large improvement forusing MT data over a grammar-based and over an in-house data collectionbaseline, while reducing the manual effort greatly. Both filtering andpost-processing approaches improve results further.

 

Quick Read (beta)

loading the full paper ...