Neural Machine Translation into Language Varieties

Abstract

Both research and commercial machine translation have so far neglected theimportance of properly handling the spelling, lexical and grammar divergencesoccurring among language varieties. Notable cases are standard nationalvarieties such as Brazilian and European Portuguese, and Canadian and EuropeanFrench, which popular online machine translation services are not keepingdistinct. We show that an evident side effect of modeling such varieties asunique classes is the generation of inconsistent translations. In this work, weinvestigate the problem of training neural machine translation from English tospecific pairs of language varieties, assuming both labeled and unlabeledparallel texts, and low-resource conditions. We report experiments from Englishto two pairs of dialects, EuropeanBrazilian Portuguese and European-CanadianFrench, and two pairs of standardized varieties, Croatian-Serbian andIndonesian-Malay. We show significant BLEU score improvements over baselinesystems when translation into similar languages is learned as a multilingualtask with shared representations.

Quick Read (beta)

loading the full paper ...