A reproduction of Apple's bi-directional LSTM models for language identification in short strings

  • 2021-02-11 21:46:43
  • Mads Toftrup, Søren Asger Sørensen, Manuel R. Ciosici, Ira Assent
  • 3

Abstract

Language Identification is the task of identifying a document's language. Forapplications like automatic spell checker selection, language identificationmust use very short strings such as text message fragments. In this work, wereproduce a language identification architecture that Apple briefly sketched ina blog post. We confirm the bi-LSTM model's performance and find that itoutperforms current open-source language identifiers. We further find that itslanguage identification mistakes are due to confusion between relatedlanguages.

 

Quick Read (beta)

loading the full paper ...