MicroNet for Efficient Language Modeling

Abstract

It is important to design compact language models for efficient deployment.We improve upon recent advances in both the language modeling domain and themodel-compression domain to construct parameter and computation efficientlanguage models. We use an efficient transformer-based architecture withadaptive embedding and softmax, differentiable non-parametric cache, Hebbiansoftmax, knowledge distillation, network pruning, and low-bit quantization. Inthis paper, we provide the winning solution to the NeurIPS 2019 MicroNetChallenge in the language modeling track. Compared to the baseline languagemodel provided by the MicroNet Challenge, our model is 90 times moreparameter-efficient and 36 times more computation-efficient while achieving therequired test perplexity of 35 on the Wikitext-103 dataset. We hope that thiswork will aid future research into efficient language models, and we havereleased our full source code athttps://github.com/mit-han-lab/neurips-micronet.

Quick Read (beta)

loading the full paper ...