Local Translation Services for Neglected Languages

Abstract

Taking advantage of computationally lightweight, but high-quality translatorsprompt consideration of new applications that address neglected languages.Locally run translators for less popular languages may assist data projectswith protected or personal data that may require specific compliance checksbefore posting to a public translation API, but which could render reasonable,cost-effective solutions if done with an army of local, small-scale pairtranslators. Like handling a specialist's dialect, this research illustratestranslating two historically interesting, but obfuscated languages: 1)hacker-speak ("l33t") and 2) reverse (or "mirror") writing as practiced byLeonardo da Vinci. The work generalizes a deep learning architecture totranslatable variants of hacker-speak with lite, medium, and hard vocabularies.The original contribution highlights a fluent translator of hacker-speak inunder 50 megabytes and demonstrates a generator for augmenting future datasetswith greater than a million bilingual sentence pairs. The long short-termmemory, recurrent neural network (LSTM-RNN) extends previous work demonstratingan English-to-foreign translation service built from as little as 10,000bilingual sentence pairs. This work further solves the equivalent translationproblem in twenty-six additional (non-obfuscated) languages and rank ordersthose models and their proficiency quantitatively with Italian as the mostsuccessful and Mandarin Chinese as the most challenging. For neglectedlanguages, the method prototypes novel services for smaller niche translationssuch as Kabyle (Algerian dialect) which covers between 5-7 million speakers butone which for most enterprise translators, has not yet reached development. Oneanticipates the extension of this approach to other important dialects, such astranslating technical (medical or legal) jargon and processing health records.

Quick Read (beta)

loading the full paper ...