Synthetic Error Dataset Generation Mimicking Bengali Writing Pattern

  • 2020-05-21 15:49:16
  • Md. Habibur Rahman Sifat, Chowdhury Rafeed Rahman, Mohammad Rafsan, Md. Hasibur Rahman
  • 0

Abstract

While writing Bengali using English keyboard, users often make spellingmistakes. The accuracy of any Bengali spell checker or paragraph correctionmodule largely depends on the kind of error dataset it is based on. Manualgeneration of such error dataset is a cumbersome process. In this research, Wepresent an algorithm for automatic misspelled Bengali word generation fromcorrect word through analyzing Bengali writing pattern using QWERTY layoutEnglish keyboard. As part of our analysis, we have formed a list of mostcommonly used Bengali words, phonetically similar replaceable clusters,frequently mispressed replaceable clusters, frequently mispressed insertionprone clusters and some rules for Juktakkhar (constant letter clusters)handling while generating errors.

 

Quick Read (beta)

loading the full paper ...