Transfer Learning for Scene Text Recognition in Indian Languages

Abstract

Scene text recognition in low-resource Indian languages is challengingbecause of complexities like multiple scripts, fonts, text size, andorientations. In this work, we investigate the power of transfer learning forall the layers of deep scene text recognition networks from English to twocommon Indian languages. We perform experiments on the conventional CRNN modeland STAR-Net to ensure generalisability. To study the effect of change indifferent scripts, we initially run our experiments on synthetic word imagesrendered using Unicode fonts. We show that the transfer of English models tosimple synthetic datasets of Indian languages is not practical. Instead, wepropose to apply transfer learning techniques among Indian languages due tosimilarity in their n-gram distributions and visual features like the vowelsand conjunct characters. We then study the transfer learning among six Indianlanguages with varying complexities in fonts and word length statistics. Wealso demonstrate that the learned features of the models transferred from otherIndian languages are visually closer (and sometimes even better) to theindividual model features than those transferred from English. We finally setnew benchmarks for scene-text recognition on Hindi, Telugu, and Malayalamdatasets from IIIT-ILST and Bangla dataset from MLT-17 by achieving 6%, 5%, 2%,and 23% gains in Word Recognition Rates (WRRs) compared to previous works. Wefurther improve the MLT-17 Bangla results by plugging in a novel correctionBiLSTM into our model. We additionally release a dataset of around 440 sceneimages containing 500 Gujarati and 2535 Tamil words. WRRs improve over thebaselines by 8%, 4%, 5%, and 3% on the MLT-19 Hindi and Bangla datasets and theGujarati and Tamil datasets.

Quick Read (beta)

loading the full paper ...