Compressing Language Models using Doped Kronecker Products

  • 2020-01-31 05:36:36
  • Urmish Thakker, Paul Whatamough, Matthew Mattina, Jesse Beu
  • 0


Kronecker Products (KP) have been used to compress IoT RNN Applications by15-38x compression factors, achieving better results than traditionalcompression methods. However when KP is applied to large Natural LanguageProcessing tasks, it leads to significant accuracy loss (approx 26%). Thispaper proposes a way to recover accuracy otherwise lost when applying KP tolarge NLP tasks, by allowing additional degrees of freedom in the KP matrix.More formally, we propose doping, a process of adding an extremely sparseoverlay matrix on top of the pre-defined KP structure. We call this compressionmethod doped kronecker product compression. To train these models, we present anew solution to the phenomenon of co-matrix adaption (CMA), which uses a newregularization scheme called co matrix dropout regularization (CMR). We presentexperimental results that demonstrate compression of a large language modelwith LSTM layers of size 25 MB by 25x with 1.4% loss in perplexity score. At25x compression, an equivalent pruned network leads to 7.9% loss in perplexityscore, while HMD and LMF lead to 15% and 27% loss in perplexity scorerespectively.


Quick Read (beta)

loading the full paper ...