Unsupervised Machine Translation On Dravidian Languages

Abstract

Unsupervised neural machine translation (UNMT) is beneficial especially forlow resource languages such as those from the Dravidian family. However, UNMTsystems tend to fail in realistic scenarios involving actual low resourcelanguages. Recent works propose to utilize auxiliary parallel data and haveachieved state-of-the-art results. In this work, we focus on unsupervisedtranslation between English and Kannada, a low resource Dravidian language. Weadditionally utilize a limited amount of auxiliary data between English andother related Dravidian languages. We show that unifying the writing systems isessential in unsupervised translation between the Dravidian languages. Weexplore several model architectures that use the auxiliary data in order tomaximize knowledge sharing and enable UNMT for distant language pairs. Ourexperiments demonstrate that it is crucial to include auxiliary languages thatare similar to our focal language, Kannada. Furthermore, we propose a metric tomeasure language similarity and show that it serves as a good indicator forselecting the auxiliary languages.

Quick Read (beta)

loading the full paper ...