Statistical Model Compression for Small-Footprint Natural Language Understanding

Abstract

In this paper we investigate statistical model compression applied to naturallanguage understanding (NLU) models. Small-footprint NLU models are importantfor enabling offline systems on hardware restricted devices, and for decreasingon-demand model loading latency in cloud-based systems. To compress NLU models,we present two main techniques, parameter quantization and perfect featurehashing. These techniques are complementary to existing model pruningstrategies such as L1 regularization. We performed experiments on a large scaleNLU system. The results show that our approach achieves 14-fold reduction inmemory usage compared to the original models with minimal predictiveperformance impact.

Quick Read (beta)

loading the full paper ...