Komodo: A Linguistic Expedition into Indonesia's Regional Languages

  • 2024-03-19 06:49:01
  • Louis Owen, Vishesh Tripathi, Abhay Kumar, Biddwan Ahmed
The recent breakthroughs in Large Language Models (LLMs) have mostly focusedon languages with easily available and sufficient resources, such as English.However, there remains a significant gap for languages that lack sufficientlinguistic resources in the public domain. Our work introduces Komodo-7B,7-billion-parameter Large Language Models designed to address this gap byseamlessly operating across Indonesian, English, and 11 regional languages inIndonesia. Komodo-7B is a family of LLMs that consist of Komodo-7B-Base andKomodo-7B-Instruct. Komodo-7B-Instruct stands out by achieving state-of-the-artperformance in various tasks and languages, outperforming the benchmarks set byOpenAI's GPT-3.5, Cohere's Aya-101, Llama-2-Chat-13B,Mixtral-8x7B-Instruct-v0.1, Gemma-7B-it , and many more. This model not onlydemonstrates superior performance in both language-specific and overallassessments but also highlights its capability to excel in linguisticdiversity. Our commitment to advancing language models extends beyondwell-resourced languages, aiming to bridge the gap for those with limitedlinguistic assets. Additionally, Komodo-7B-Instruct's better cross-languageunderstanding contributes to addressing educational disparities in Indonesia,offering direct translations from English to 11 regional languages, asignificant improvement compared to existing language translation services.Komodo-7B represents a crucial step towards inclusivity and effectiveness inlanguage models, providing to the linguistic needs of diverse communities.


