Abstract
This paper presents a scene text detection technique that exploitsbootstrapping and text border semantics for accurate localization of texts inscenes. A novel bootstrapping technique is designed which samples multiple'subsections' of a word or text line and accordingly relieves the constraint oflimited training data effectively. At the same time, the repeated sampling oftext 'subsections' improves the consistency of the predicted text feature mapswhich is critical in predicting a single complete instead of multiple brokenboxes for long words or text lines. In addition, a semantics-aware text borderdetection technique is designed which produces four types of text bordersegments for each scene text. With semantics-aware text borders, scene textscan be localized more accurately by regressing text pixels around the ends ofwords or text lines instead of all text pixels which often leads to inaccuratelocalization while dealing with long words or text lines. Extensive experimentsdemonstrate the effectiveness of the proposed techniques, and superiorperformance is obtained over several public datasets, e. g. 80.1 f-score forthe MSRA-TD500, 67.1 f-score for the ICDAR2017-RCTW, etc.