One-To-Many Multilingual End-to-end Speech Translation

  • 2019-10-08 10:29:09
  • Mattia Antonino Di Gangi, Matteo Negri, Marco Turchi
  • 3

Abstract

Nowadays, training end-to-end neural models for spoken language translation(SLT) still has to confront with extreme data scarcity conditions. The existingSLT parallel corpora are indeed orders of magnitude smaller than thoseavailable for the closely related tasks of automatic speech recognition (ASR)and machine translation (MT), which usually comprise tens of millions ofinstances. To cope with data paucity, in this paper we explore theeffectiveness of transfer learning in end-to-end SLT by presenting amultilingual approach to the task. Multilingual solutions are widely studied inMT and usually rely on ``\textit{target forcing}'', in which multilingualparallel data are combined to train a single model by prepending to the inputsequences a language token that specifies the target language. However, whentested in speech translation, our experiments show that MT-like \textit{targetforcing}, used as is, is not effective in discriminating among the targetlanguages. Thus, we propose a variant that uses target-language embeddings toshift the input representations in different portions of the space according tothe language, so to better support the production of output in the desiredtarget language. Our experiments on end-to-end SLT from English into sixlanguages show important improvements when translating into similar languages,especially when these are supported by scarce data. Further improvements areobtained when using English ASR data as an additional language (up to $+2.5$BLEU points).

 

Quick Read (beta)

loading the full paper ...