Abstract
An end-to-end trainable (fully differentiable) method for multi-languagescene text localization and recognition is proposed. The approach is based on asingle fully convolutional network (FCN) with shared layers for both tasks. E2E-MLT is the first published multi-language OCR for scene text. Whiletrained in multi-language setup, E2E-MLT demonstrates competitive performancewhen compared to other methods trained for English scene text alone. Theexperiments show that obtaining accurate multi-language multi-scriptannotations is a challenging problem.
Quick Read (beta)
loading the full paper ...