edATLAS: An Efficient Disambiguation Algorithm for Texting in Languages with Abugida Scripts

  • 2021-01-05 03:16:34
  • Sourav Ghosh, Sourabh Vasant Gothe, Chandramouli Sanchi, Barath Raj Kandur Raja
  • 2

Abstract

Abugida refers to a phonogram writing system where each syllable isrepresented using a single consonant or typographic ligature, along with adefault vowel or optional diacritic(s) to denote other vowels. However, textingin these languages has some unique challenges in spite of the advent of deviceswith soft keyboard supporting custom key layouts. The number of characters inthese languages is large enough to require characters to be spread overmultiple views in the layout. Having to switch between views many times to typea single word hinders the natural thought process. This prevents popular usageof native keyboard layouts. On the other hand, supporting romanized scripts(native words transcribed using Latin characters) with language model basedsuggestions is also set back by the lack of uniform romanization rules. To this end, we propose a disambiguation algorithm and showcase itsusefulness in two novel mutually non-exclusive input methods for languagesnatively using the abugida writing system: (a) disambiguation of ambiguousinput for abugida scripts, and (b) disambiguation of word variants in romanizedscripts. We benchmark these approaches using public datasets, and show animprovement in typing speed by 19.49%, 25.13%, and 14.89%, in Hindi, Bengali,and Thai, respectively, using Ambiguous Input, owing to the human ease oflocating keys combined with the efficiency of our inference method. Our WordVariant Disambiguation (WDA) maps valid variants of romanized words, previouslytreated as Out-of-Vocab, to a vocabulary of 100k words with high accuracy,leading to an increase in Error Correction F1 score by 10.03% and Next WordPrediction (NWP) by 62.50% on average.

 

Quick Read (beta)

loading the full paper ...