OmniKnight: Multilingual Neural Machine Translation with Language-Specific Self-Distillation

  • 2022-05-03 17:57:40
  • Yichong Huang, Xiaocheng Feng, Xinwei Geng, Bing Qin
  • 2


Although all-in-one-model multilingual neural machine translation (MNMT) hasachieved remarkable progress in recent years, its selected best overallcheckpoint fails to achieve the best performance simultaneously in all languagepairs. It is because that the best checkpoints for each individual languagepair (i.e., language-specific best checkpoints) scatter in different epochs. Inthis paper, we present a novel training strategy dubbed Language-SpecificSelf-Distillation (LSSD) for bridging the gap between language-specific bestcheckpoints and the overall best checkpoint. In detail, we regard eachlanguage-specific best checkpoint as a teacher to distill the overall bestcheckpoint. Moreover, we systematically explore three variants of our LSSD,which perform distillation statically, selectively, and adaptively.Experimental results on two widely-used benchmarks show that LSSD obtainsconsistent improvements towards all language pairs and achieves thestate-of-the-art


