Urdu-English Machine Transliteration using Neural Networks

  • 2020-01-12 17:30:42
  • Usman Mohy ud Din
  • 2

Abstract

Machine translation has gained much attention in recent years. It is asub-field of computational linguistic which focus on translating text from onelanguage to other language. Among different translation techniques, neuralnetwork currently leading the domain with its capabilities of providing asingle large neural network with attention mechanism, sequence-to-sequence andlong-short term modelling. Despite significant progress in domain of machinetranslation, translation of out-of-vocabulary words(OOV) which includetechnical terms, named-entities, foreign words are still a challenge forcurrent state-of-art translation systems, and this situation becomes even worsewhile translating between low resource languages or languages having differentstructures. Due to morphological richness of a language, a word may havedifferent meninges in different context. In such scenarios, translation of wordis not only enough in order provide the correct/quality translation.Transliteration is a way to consider the context of word/sentence duringtranslation. For low resource language like Urdu, it is very difficult tohave/find parallel corpus for transliteration which is large enough to trainthe system. In this work, we presented transliteration technique based onExpectation Maximization (EM) which is un-supervised and language independent.Systems learns the pattern and out-of-vocabulary (OOV) words from parallelcorpus and there is no need to train it on transliteration corpus explicitly.This approach is tested on three models of statistical machine translation(SMT) which include phrasebased, hierarchical phrase-based and factor basedmodels and two models of neural machine translation which include LSTM andtransformer model.

 

Quick Read (beta)

loading the full paper ...