Abstract
Vocabulary adaptation, which integrates new vocabulary into pre-trainedlanguage models, enables expansion to new languages and mitigates tokenover-fragmentation. However, existing approaches are limited by their relianceon heuristics or external embeddings. We propose VocADT, a novel method forvocabulary adaptation using adapter modules that are trained to learn theoptimal linear combination of existing embeddings while keeping the model'sweights fixed. VocADT offers a flexible and scalable solution without dependingon external resources or language constraints. Across 11 languages-with diversescripts, resource availability, and fragmentation-we demonstrate that VocADToutperforms the original Mistral model and other baselines across variousmultilingual tasks including natural language understanding and machinetranslation. We find that Latin-script languages and highly fragmentedlanguages benefit the most from vocabulary adaptation. We further fine-tune theadapted model on the generative task of machine translation and find thatvocabulary adaptation is still beneficial after fine-tuning and that VocADT isthe most effective.