Abstract
We present Eir-8B, a large language model with 8 billion parameters,specifically designed to enhance the accuracy of handling medical tasks in theThai language. This model focuses on providing clear and easy-to-understandanswers for both healthcare professionals and patients, thereby improving theefficiency of diagnosis and treatment processes. Human evaluation was conductedto ensure that the model adheres to care standards and provides unbiasedanswers. To prioritize data security, the model is deployed within the hospital'sinternal network, ensuring both high security and faster processing speeds. Theinternal API connection is secured with encryption and strict authenticationmeasures to prevent data leaks and unauthorized access. We evaluated several open-source large language models with 8 billionparameters on four medical benchmarks: MedQA, MedMCQA, PubMedQA, and themedical subset of MMLU. The best-performing baselines were used to developEir-8B. Our evaluation employed multiple questioning strategies, includingzero-shot, few-shot, chain-of-thought reasoning, and ensemble/self-consistencyvoting methods. Our model outperformed commercially available Thai-languagelarge language models by more than 10%. In addition, we developed enhancedmodel testing tailored for clinical use in Thai across 18 clinical tasks, whereour model exceeded GPT-4o performance by more than 11%.