The Evolved Transformer

  • 2019-01-30 22:03:01
  • David R. So, Chen Liang, Quoc V. Le
  • 53


Recent works have highlighted the strengths of the Transformer architecturefor dealing with sequence tasks. At the same time, neural architecture searchhas advanced to the point where it can outperform human-designed models. Thegoal of this work is to use architecture search to find a better Transformerarchitecture. We first construct a large search space inspired by the recentadvances in feed-forward sequential models and then run evolutionaryarchitecture search, seeding our initial population with the Transformer. Toeffectively run this search on the computationally expensive WMT 2014English-German translation task, we develop the progressive dynamic hurdlesmethod, which allows us to dynamically allocate more resources to morepromising candidate models. The architecture found in our experiments - theEvolved Transformer - demonstrates consistent improvement over the Transformeron four well-established language tasks: WMT 2014 English-German, WMT 2014English-French, WMT 2014 English-Czech and LM1B. At big model size, the EvolvedTransformer is twice as efficient as the Transformer in FLOPS without loss inquality. At a much smaller - mobile-friendly - model size of ~7M parameters,the Evolved Transformer outperforms the Transformer by 0.7 BLEU on WMT'14English-German.


Introduction (beta)



Conclusion (beta)