Neural Machine Translation of Logographic Languages Using Sub-character Level Information

  • 2018-09-07 22:02:43
  • Longtu Zhang, Mamoru Komachi
  • 0

Abstract

Recent neural machine translation (NMT) systems have been greatly improved byencoder-decoder models with attention mechanisms and sub-word units. However,important differences between languages with logographic and alphabetic writingsystems have long been overlooked. This study focuses on these differences anduses a simple approach to improve the performance of NMT systems utilizingdecomposed sub-character level information for logographic languages. Ourresults indicate that our approach not only improves the translationcapabilities of NMT systems between Chinese and English, but also furtherimproves NMT systems between Chinese and Japanese, because it utilizes theshared information brought by similar sub-character units.

 

Quick Read (beta)

loading the full paper ...