LLaVA-NeuMT: Selective Layer-Neuron Modulation for Efficient Multilingual Multimodal Translation

Abstract

Multimodal Machine Translation (MMT) enhances translation quality byincorporating visual context, helping to resolve textual ambiguities. Whileexisting MMT methods perform well in bilingual settings, extending them tomultilingual translation remains challenging due to cross-lingual interferenceand ineffective parameter-sharing strategies. To address this, we proposeLLaVA-NeuMT, a novel multimodal multilingual translation framework thatexplicitly models language-specific and language-agnostic representations tomitigate multilingual interference. Our approach consists of a layer selectionmechanism that identifies the most informative layers for different languagepairs and a neuron-level adaptation strategy that dynamically selectslanguage-specific and agnostic neurons to improve translation quality whilereducing redundancy. We conduct extensive experiments on the M3-Multi30K andM3-AmbigCaps datasets, demonstrating that LLaVA-NeuMT, while fine-tuning only40\% of the model parameters, surpasses full fine-tuning approaches andultimately achieves SOTA results on both datasets. Our analysis furtherprovides insights into the importance of selected layers and neurons inmultimodal multilingual adaptation, offering an efficient and scalable solutionto cross-lingual adaptation in multimodal translation.

Quick Read (beta)

loading the full paper ...