Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

Abstract

Adapting medical Large Language Models to local languages can reduce barriersto accessing healthcare services, but data scarcity remains a significantchallenge, particularly for low-resource languages. To address this, we firstconstruct a high-quality medical dataset and conduct analysis to ensure itsquality. In order to leverage the generalization capability of multilingualLLMs to efficiently scale to more resource-constrained languages, we explorethe internal information flow of LLMs from a multilingual perspective usingMixture of Experts (MoE) modularity. Technically, we propose a novel MoErouting method that employs language-specific experts and cross-lingualrouting. Inspired by circuit theory, our routing analysis revealed a Spread Outin the End information flow mechanism: while earlier layers concentratecross-lingual information flow, the later layers exhibit language-specificdivergence. This insight directly led to the development of the Post-MoEarchitecture, which applies sparse routing only in the later layers whilemaintaining dense others. Experimental results demonstrate that this approachenhances the generalization of multilingual models to other languages whilepreserving interpretability. Finally, to efficiently scale the model to 50languages, we introduce the concept of language family experts, drawing onlinguistic priors, which enables scaling the number of languages without addingadditional parameters.

Quick Read (beta)

loading the full paper ...