Task Arithmetic for Language Expansion in Speech Translation

Abstract

Recent advances in large language models (LLMs) have gained interest inspeech-text multimodal foundation models, achieving strong performance oninstruction-based speech translation (ST). However, expanding language pairsfrom an existing instruction-tuned ST system is costly due to the necessity ofre-training on a combination of new and previous datasets. We propose to expandnew language pairs by merging the model trained on new language pairs and theexisting model, using task arithmetic. We find that the direct application oftask arithmetic for ST causes the merged model to fail to follow instructions;thus, generating translation in incorrect languages. To eliminate languageconfusion, we propose an augmented task arithmetic method that merges anadditional language control model. It is trained to generate the correct targetlanguage token following the instructions. Our experiments demonstrate that ourproposed language control model can achieve language expansion by eliminatinglanguage confusion. In our MuST-C and CoVoST-2 experiments, it shows up to 4.66and 4.92 BLEU scores improvement, respectively. In addition, we demonstrate theuse of our task arithmetic framework can expand to a language pair whereneither paired ST training data nor a pre-trained ST model is available. Wefirst synthesize the ST system from machine translation (MT) systems via taskanalogy, then merge the synthesized ST system to the existing ST model.

Quick Read (beta)

loading the full paper ...