Language Detection Engine for Multilingual Texting on Mobile Devices

  • 2021-01-07 16:49:47
  • Sourabh Vasant Gothe, Sourav Ghosh, Sharmila Mani, Guggilla Bhanodai, Ankur Agarwal, Chandramouli Sanchi
  • 2

Abstract

More than 2 billion mobile users worldwide type in multiple languages in thesoft keyboard. On a monolingual keyboard, 38% of falsely auto-corrected wordsare valid in another language. This can be easily avoided by detecting thelanguage of typed words and then validating it in its respective language.Language detection is a well-known problem in natural language processing. Inthis paper, we present a fast, light-weight and accurate Language DetectionEngine (LDE) for multilingual typing that dynamically adapts to user intendedlanguage in real-time. We propose a novel approach where the fusion ofcharacter N-gram model and logistic regression based selector model is used toidentify the language. Additionally, we present a unique method of reducing theinference time significantly by parameter reduction technique. We also discussvarious optimizations fabricated across LDE to resolve ambiguity in input textamong the languages with the same character pattern. Our method demonstrates anaverage accuracy of 94.5% for Indian languages in Latin script and that of 98%for European languages on the code-switched data. This model outperformsfastText by 60.39% and ML-Kit by 23.67% in F1 score for European languages. LDEis faster on mobile device with an average inference time of 25.91microseconds.

 

Quick Read (beta)

loading the full paper ...