Abstract
Large Transformers have achieved state-of-the-art performance across manytasks. Most open-source libraries on scaling Transformers focus on improvingtraining or inference with better parallelization. In this work, we presentTorchScale, an open-source toolkit that allows researchers and developers toscale up Transformers efficiently and effectively. TorchScale has theimplementation of several modeling techniques, which can improve modelinggenerality and capability, as well as training stability and efficiency.Experimental results on language modeling and neural machine translationdemonstrate that TorchScale can successfully scale Transformers to differentsizes without tears. The library is available at https://aka.ms/torchscale.
Quick Read (beta)
loading the full paper ...