Improving Zero-shot Cross-lingual Transfer between Closely Related Languages by injecting Character-level Noise

  • 2021-09-14 15:38:08
  • NoĆ«mi Aepli, Rico Sennrich
Cross-lingual transfer between a high-resource language and its dialects orclosely related language varieties should be facilitated by their similarity,but current approaches that operate in the embedding space do not take surfacesimilarity into account. In this work, we present a simple yet effectivestrategy to improve cross-lingual transfer between closely related varieties byaugmenting the data of the high-resource parent language with character-levelnoise to make the model more robust towards spelling variations. Our strategyshows consistent improvements over several languages and tasks: Zero-shottransfer of POS tagging and topic identification between language varietiesfrom the Germanic, Uralic, and Romance language genera. Our work providesevidence for the usefulness of simple surface-level noise in improving transferbetween language varieties.


