TorchScale: Transformers at Scale

  • 2022-11-23 17:58:51
  • Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei
  • 9


Large Transformers have achieved state-of-the-art performance across manytasks. Most open-source libraries on scaling Transformers focus on improvingtraining or inference with better parallelization. In this work, we presentTorchScale, an open-source toolkit that allows researchers and developers toscale up Transformers efficiently and effectively. TorchScale has theimplementation of several modeling techniques, which can improve modelinggenerality and capability, as well as training stability and efficiency.Experimental results on language modeling and neural machine translationdemonstrate that TorchScale can successfully scale Transformers to differentsizes without tears. The library is available at


Quick Read (beta)

loading the full paper ...