SINA-BERT: A pre-trained Language Model for Analysis of Medical Texts in Persian

  • 2021-04-15 17:22:27
  • Nasrin Taghizadeh, Ehsan Doostmohammadi, Elham Seifossadat, Hamid R. Rabiee, Maedeh S. Tahaei
  • 5

Abstract

We have released Sina-BERT, a language model pre-trained on BERT (Devlin etal., 2018) to address the lack of a high-quality Persian language model in themedical domain. SINA-BERT utilizes pre-training on a large-scale corpus ofmedical contents including formal and informal texts collected from a varietyof online resources in order to improve the performance on health-care relatedtasks. We employ SINA-BERT to complete following representative tasks:categorization of medical questions, medical sentiment analysis, and medicalquestion retrieval. For each task, we have developed Persian annotated datasets for training and evaluation and learnt a representation for the data ofeach task especially complex and long medical questions. With the samearchitecture being used across tasks, SINA-BERT outperforms BERT-based modelsthat were previously made available in the Persian language.

 

Quick Read (beta)

loading the full paper ...