Transfer learning approaches for Neural Machine Translation (NMT) train a NMTmodel on the assisting-target language pair (parent model) which is laterfine-tuned for the source-target language pair of interest (child model), withthe target language being the same. In many cases, the assisting language has adifferent word order from the source language. We show that divergent wordorder adversely limits the benefits from transfer learning when little to noparallel corpus between the source and target language is available. To bridgethis divergence, We propose to pre-order the assisting language sentence tomatch the word order of the source language and train the parent model. Ourexperiments on many language pairs show that bridging the word order gap leadsto significant improvement in the translation quality.