Offline Extraction of Indic Regional Language from Natural Scene Image using Text Segmentation and Deep Convolutional Sequence

  • 2018-07-06 20:10:03
  • Sauradip Nag, Pallab Kumar Ganguly, Sumit Roy, Sourab Jha, Krishna Bose, Abhishek Jha, Kousik Dasgupta
  • 0

Abstract

Regional language extraction from a natural scene image is always achallenging proposition due to its dependence on the text information extractedfrom Image. Text Extraction on the other hand varies on different lightingcondition, arbitrary orientation, inadequate text information, heavy backgroundinfluence over text and change of text appearance. This paper presents a novelunified method for tackling the above challenges. The proposed work uses animage correction and segmentation technique on the existing Text DetectionPipeline an Efficient and Accurate Scene Text Detector (EAST). EAST usesstandard PVAnet architecture to select features and non maximal suppression todetect text from image. Text recognition is done using combined architecture ofMaxOut convolution neural network (CNN) and Bidirectional long short termmemory (LSTM) network. After recognizing text using the Deep Learning basedapproach, the native Languages are translated to English and tokenized usingstandard Text Tokenizers. The tokens that very likely represent a location isused to find the Global Positioning System (GPS) coordinates of the locationand subsequently the regional languages spoken in that location is extracted.The proposed method is tested on a self generated dataset collected fromGovernment of India dataset and experimented on Standard Dataset to evaluatethe performance of the proposed technique. Comparative study with a fewstate-of-the-art methods on text detection, recognition and extraction ofregional language from images shows that the proposed method outperforms theexisting methods.

 

Quick Read (beta)

loading the full paper ...