Reweighted Proximal Pruning for Large-Scale Language Representation

Abstract

Recently, pre-trained language representation flourishes as the mainstay ofthe natural language understanding community, e.g., BERT. These pre-trainedlanguage representations can create state-of-the-art results on a wide range ofdownstream tasks. Along with continuous significant performance improvement,the size and complexity of these pre-trained neural models continue to increaserapidly. Is it possible to compress these large-scale language representationmodels? How will the pruned language representation affect the downstreammulti-task transfer learning objectives? In this paper, we propose ReweightedProximal Pruning (RPP), a new pruning method specifically designed for alarge-scale language representation model. Through experiments on SQuAD and theGLUE benchmark suite, we show that proximal pruned BERT keeps high accuracy forboth the pre-training task and the downstream multiple fine-tuning tasks athigh prune ratio. RPP provides a new perspective to help us analyze whatlarge-scale language representation might learn. Additionally, RPP makes itpossible to deploy a large state-of-the-art language representation model suchas BERT on a series of distinct devices (e.g., online servers, mobile phones,and edge devices).

Quick Read (beta)

loading the full paper ...