Abstract
Exploiting cognates for transfer learning in under-resourced languages is anexciting opportunity for language understanding tasks, including unsupervisedmachine translation, named entity recognition and information retrieval.Previous approaches mainly focused on supervised cognate detection tasks basedon orthographic, phonetic or state-of-the-art contextual language models, whichunder-perform for most under-resourced languages. This paper proposes a novellanguage-agnostic weakly-supervised deep cognate detection framework forunder-resourced languages using morphological knowledge from closely relatedlanguages. We train an encoder to gain morphological knowledge of a languageand transfer the knowledge to perform unsupervised and weakly-supervisedcognate detection tasks with and without the pivot language for theclosely-related languages. While unsupervised, it overcomes the need forhand-crafted annotation of cognates. We performed experiments on differentpublished cognate detection datasets across language families and observed notonly significant improvement over the state-of-the-art but also our methodoutperformed the state-of-the-art supervised and unsupervised methods. Ourmodel can be extended to a wide range of languages from any language family asit overcomes the requirement of the annotation of the cognate pairs fortraining. The code and dataset building scripts can be found athttps://github.com/koustavagoswami/Weakly_supervised-Cognate_Detection