ViraPart: A Text Refinement Framework for Automatic Speech Recognition and Natural Language Processing Tasks in Persian

  • 2021-11-21 07:53:23
  • Narges Farokhshad, Milad Molazadeh, Saman Jamalabbasi, Hamed Babaei Giglou, Saeed Bibak
The Persian language is an inflectional subject-object-verb language. Thisfact makes Persian a more uncertain language. However, using techniques such asZero-Width Non-Joiner (ZWNJ) recognition, punctuation restoration, and PersianEzafe construction will lead us to a more understandable and precise language.In most of the works in Persian, these techniques are addressed individually.Despite that, we believe that for text refinement in Persian, all of thesetasks are necessary. In this work, we proposed a ViraPart framework that usesembedded ParsBERT in its core for text clarifications. First, used the BERTvariant for Persian following by a classifier layer for classificationprocedures. Next, we combined models outputs to output cleartext. In the end,the proposed model for ZWNJ recognition, punctuation restoration, and PersianEzafe construction performs the averaged F1 macro scores of 96.90%, 92.13%, and98.50%, respectively. Experimental results show that our proposed approach isvery effective in text refinement for the Persian language.


