Relay Decoding: Concatenating Large Language Models for Machine Translation

Abstract

Leveraging large language models for machine translation has demonstratedpromising results. However, it does require the large language models topossess the capability of handling both the source and target languages inmachine translation. When it is challenging to find large models that supportthe desired languages, resorting to continuous learning methods becomes acostly endeavor. To mitigate these expenses, we propose an innovative approachcalled RD (Relay Decoding), which entails concatenating two distinct largemodels that individually support the source and target languages. Byincorporating a simple mapping layer to facilitate the connection between thesetwo models and utilizing a limited amount of parallel data for training, wesuccessfully achieve superior results in the machine translation task.Experimental results conducted on the Multi30k and WikiMatrix datasets validatethe effectiveness of our proposed method.

Quick Read (beta)

loading the full paper ...