Sequence to Sequence Networks for Roman-Urdu to Urdu Transliteration

  • 2017-12-08 06:36:54
  • Mehreen Alam, Sibt ul Hussain
  • 4

Abstract

Neural Machine Translation models have replaced the conventional phrase basedstatistical translation methods since the former takes a generic, scalable,data-driven approach rather than relying on manual, hand-crafted features. Theneural machine translation system is based on one neural network that iscomposed of two parts, one that is responsible for input language sentence andother part that handles the desired output language sentence. This model basedon encoder-decoder architecture also takes as input the distributedrepresentations of the source language which enriches the learnt dependenciesand gives a warm start to the network. In this work, we transform Roman-Urdu toUrdu transliteration into sequence to sequence learning problem. To this end,we make the following contributions. We create the first ever parallel corporaof Roman-Urdu to Urdu, create the first ever distributed representation ofRoman-Urdu and present the first neural machine translation model thattransliterates text from Roman-Urdu to Urdu language. Our model has achievedthe state-of-the-art results using BLEU as the evaluation metric. Precisely,our model is able to correctly predict sentences up to length 10 whileachieving BLEU score of 48.6 on the test set. We are hopeful that our model andour results shall serve as the baseline for further work in the domain ofneural machine translation for Roman-Urdu to Urdu using distributedrepresentation.

 

Quick Read (beta)

loading the full paper ...