Abstract
In this work, we introduce instruction finetuning for Neural MachineTranslation (NMT) models, which distills instruction following capabilitiesfrom Large Language Models (LLMs) into orders-of-magnitude smaller NMT models.Our instruction-finetuning recipe for NMT models enables customization oftranslations for a limited but disparate set of translation-specific tasks. Weshow that NMT models are capable of following multiple instructionssimultaneously and demonstrate capabilities of zero-shot composition ofinstructions. We also show that through instruction finetuning, traditionallydisparate tasks such as formality-controlled machine translation, multi-domainadaptation as well as multi-modal translations can be tackled jointly by asingle instruction finetuned NMT model, at a performance level comparable toLLMs such as GPT-3.5-Turbo. To the best of our knowledge, our work is among thefirst to demonstrate the instruction-following capabilities of traditional NMTmodels, which allows for faster, cheaper and more efficient serving ofcustomized translations.