Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

Abstract

Many efforts have been made to facilitate natural language processing taskswith pre-trained language models (LMs), and brought significant improvements tovarious applications. To fully leverage the nearly unlimited corpora andcapture linguistic information of multifarious levels, large-size LMs arerequired; but for a specific task, only parts of these information are useful.Such large-sized LMs, even in the inference stage, may cause heavy computationworkloads, making them too time-consuming for large-scale applications. Here wepropose to compress bulky LMs while preserving useful information with regardto a specific task. As different layers of the model keep differentinformation, we develop a layer selection method for model pruning usingsparsity-inducing regularization. By introducing the dense connectivity, we candetach any layer without affecting others, and stretch shallow and wide LMs tobe deep and narrow. In model training, LMs are learned with layer-wise dropoutsfor better robustness. Experiments on two benchmark datasets demonstrate theeffectiveness of our method.

Quick Read (beta)

loading the full paper ...