Mitigating the Linguistic Gap with Phonemic Representations for Robust Cross-lingual Transfer

  • 2024-10-11 08:01:46
  • Haeji Jung, Changdae Oh, Jooeon Kang, Jimin Sohn, Kyungwoo Song, Jinkyu Kim, David R. Mortensen
  • 0

Abstract

Approaches to improving multilingual language understanding often strugglewith significant performance gaps between high-resource and low-resourcelanguages. While there are efforts to align the languages in a single latentspace to mitigate such gaps, how different input-level representationsinfluence such gaps has not been investigated, particularly with phonemicinputs. We hypothesize that the performance gaps are affected by representationdiscrepancies between these languages, and revisit the use of phonemicrepresentations as a means to mitigate these discrepancies. To demonstrate theeffectiveness of phonemic representations, we present experiments on threerepresentative cross-lingual tasks on 12 languages in total. The results showthat phonemic representations exhibit higher similarities between languagescompared to orthographic representations, and it consistently outperformsgrapheme-based baseline model on languages that are relatively low-resourced.We present quantitative evidence from three cross-lingual tasks thatdemonstrate the effectiveness of phonemic representations, and it is furtherjustified by a theoretical analysis of the cross-lingual performance gap.

 

Quick Read (beta)

loading the full paper ...