Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States

Abstract

Although large language models (LLMs) have transformed our expectations ofmodern language technologies, concerns over data privacy often restrict the useof commercially available LLMs hosted outside of EU jurisdictions. This limitstheir application in governmental, defence, and other data-sensitive sectors.In this work, we evaluate the extent to which locally deployable open-weightLLMs support lesser-spoken languages such as Lithuanian, Latvian, and Estonian.We examine various size and precision variants of the top-performingmultilingual open-weight models, Llama~3, Gemma~2, Phi, and NeMo, on machinetranslation, multiple-choice question answering, and free-form text generation.The results indicate that while certain models like Gemma~2 perform close tothe top commercially available models, many LLMs struggle with these languages.Most surprisingly, however, we find that these models, while showing close tostate-of-the-art translation performance, are still prone to lexicalhallucinations with errors in at least 1 in 20 words for all open-weightmultilingual LLMs.

Quick Read (beta)

loading the full paper ...