Abstract
Large Language Models (LLMs) excel in English, but their performance degradessignificantly on low-resource languages (LRLs) due to English-centric training.While methods like LangBridge align LLMs with multilingual encoders such as theMassively Multilingual Text-to-Text Transfer Transformer (mT5), they typicallyuse only the final encoder layer. We propose a novel architecture that fusesall intermediate layers, enriching the linguistic information passed to theLLM. Our approach features two strategies: (1) a Global Softmax weighting foroverall layer importance, and (2) a Transformer Softmax model that learnstoken-specific weights. The fused representations are mapped into the LLM'sembedding space, enabling it to process multilingual inputs. The model istrained only on English data, without using any parallel or multilingual data.Evaluated on XNLI, IndicXNLI, Sinhala News Classification, and Amazon Reviews,our Transformer Softmax model significantly outperforms the LangBridgebaseline. We observe strong performance gains in LRLs, improving Sinhalaclassification accuracy from 71.66% to 75.86% and achieving clear improvementsacross Indic languages such as Tamil, Bengali, and Malayalam. These specificgains contribute to an overall boost in average XNLI accuracy from 70.36% to71.50%. This approach offers a scalable, data-efficient path toward morecapable and equitable multilingual LLMs.