In this paper, we propose Double Supervised Network with Attention Mechanism(DSAN), a novel end-to-end trainable framework for scene text recognition. Itincorporates one text attention module during feature extraction which enforcesthe model to focus on text regions and the whole framework is supervised by twobranches. One supervision branch comes from context-level modelling and anothercomes from one extra supervision enhancement branch which aims at tacklinginexplicit semantic information at character level. These two supervisions canbenefit each other and yield better performance. The proposed approach canrecognize text in arbitrary length and does not need any predefined lexicon.Our method outperforms the current state-of-the-art methods on three textrecognition benchmarks: IIIT5K, ICDAR2013 and SVT reaching accuracy 88.6%,92.3% and 84.1% respectively which suggests the effectiveness of the proposedmethod.