SINA-BERT: A pre-trained Language Model for Analysis of Medical Texts in Persian

Abstract

We have released Sina-BERT, a language model pre-trained on BERT (Devlin etal., 2018) to address the lack of a high-quality Persian language model in themedical domain. SINA-BERT utilizes pre-training on a large-scale corpus ofmedical contents including formal and informal texts collected from a varietyof online resources in order to improve the performance on health-care relatedtasks. We employ SINA-BERT to complete following representative tasks:categorization of medical questions, medical sentiment analysis, and medicalquestion retrieval. For each task, we have developed Persian annotated datasets for training and evaluation and learnt a representation for the data ofeach task especially complex and long medical questions. With the samearchitecture being used across tasks, SINA-BERT outperforms BERT-based modelsthat were previously made available in the Persian language.

Quick Read (beta)

loading the full paper ...