Abstract
Keyword spotting (KWS) is beneficial for voice-based user interactions withlow-power devices at the edge. The edge devices are usually always-on, so edgecomputing brings bandwidth savings and privacy protection. The devicestypically have limited memory spaces, computational performances, power andcosts, for example, Cortex-M based microcontrollers. The challenge is to meetthe high computation and low-latency requirements of deep learning on thesedevices. This paper firstly shows our small-footprint KWS system running onSTM32F7 microcontroller with Cortex-M7 core @216MHz and 512KB static RAM. Ourselected convolutional neural network (CNN) architecture has simplified numberof operations for KWS to meet the constraint of edge devices. Our baselinesystem generates classification results for each 37ms including real-time audiofeature extraction part. This paper further evaluates the actual performancefor different pruning and quantization methods on microcontroller, includingdifferent granularity of sparsity, skipping zero weights, weight-prioritizedloop order, and SIMD instruction. The result shows that for microcontrollers,there are considerable challenges for accelerate unstructured pruned models,and the structured pruning is more friendly than unstructured pruning. Theresult also verified that the performance improvement for quantization and SIMDinstruction.