Processing Natural Language on Embedded Devices: How Well Do Modern Models Perform?

  • 2023-09-12 22:15:12
  • Souvika Sarkar, Mohammad Fakhruddin Babar, Md Mahadi Hassan, Monowar Hasan, Shubhra Kanti Karmaker Santu
  • 0

Abstract

Voice-controlled systems are becoming ubiquitous in many IoT-specificapplications such as home/industrial automation, automotive infotainment, andhealthcare. While cloud-based voice services (\eg Alexa, Siri) can leveragehigh-performance computing servers, some use cases (\eg robotics, automotiveinfotainment) may require to execute the natural language processing (NLP)tasks offline, often on resource-constrained embedded devices. Large languagemodels such as BERT and its variants are primarily developed with compute-heavyservers in mind. Despite the great performance of BERT models across variousNLP tasks, their large size and numerous parameters pose substantial obstaclesto offline computation on embedded systems. Lighter replacement of suchlanguage models (\eg DistilBERT and TinyBERT) often sacrifice accuracy,particularly for complex NLP tasks. Until now, it is still unclear \ca whetherthe state-of-the-art language models, \viz BERT and its variants are deployableon embedded systems with a limited processor, memory, and battery power and \cbif they do, what are the ``right'' set of configurations and parameters tochoose for a given NLP task. This paper presents an \textit{exploratory studyof modern language models} under different resource constraints and accuracybudgets to derive empirical observations about these resource/accuracytrade-offs. In particular, we study how the four most commonly used BERT-basedlanguage models (\eg BERT, RoBERTa, DistilBERT, and TinyBERT) perform onembedded systems. We tested them on a Raspberry Pi-based robotic platform withthree hardware configurations and four datasets running various NLP tasks. Ourfindings can help designers to understand the deployability and performance ofmodern language models, especially those based on BERT architectures, thussaving a lot of time wasted in trial-and-error efforts.

 

Quick Read (beta)

loading the full paper ...