Visual Re-ranking with Natural Language Understanding for Text Spotting

  • 2018-10-29 10:41:45
  • Ahmed Sabir, Francesc Moreno-Noguer, Lluís Padró
Many scene text recognition approaches are based on purely visual informationand ignore the semantic relation between scene and text. In this paper, wetackle this problem from natural language processing perspective to fill thegap between language and vision. We propose a post-processing approach toimprove scene text recognition accuracy by using occurrence probabilities ofwords (unigram language model), and the semantic correlation between scene andtext. For this, we initially rely on an off-the-shelf deep neural network,already trained with a large amount of data, which provides a series of texthypotheses per input image. These hypotheses are then re-ranked using wordfrequencies and semantic relatedness with objects or scenes in the image. As aresult of this combination, the performance of the original network is boostedwith almost no additional cost. We validate our approach on ICDAR'17 dataset.


