Abstract
In today's landscape, smartphones have evolved into hubs for hosting amultitude of deep learning models aimed at local execution. A key realizationdriving this work is the notable fragmentation among these models,characterized by varied architectures, operators, and implementations. Thisfragmentation imposes a significant burden on the comprehensive optimization ofhardware, system settings, and algorithms. Buoyed by the recent strides in large foundation models, this work introducesa pioneering paradigm for mobile AI: a collaborative management approachbetween the mobile OS and hardware, overseeing a foundational model capable ofserving a broad spectrum of mobile AI tasks, if not all. This foundationalmodel resides within the NPU and remains impervious to app or OS revisions,akin to firmware. Concurrently, each app contributes a concise, offlinefine-tuned "adapter" tailored to distinct downstream tasks. From this conceptemerges a concrete instantiation known as \sys. It amalgamates a curatedselection of publicly available Large Language Models (LLMs) and facilitatesdynamic data flow. This concept's viability is substantiated through thecreation of an exhaustive benchmark encompassing 38 mobile AI tasks spanning 50datasets, including domains such as Computer Vision (CV), Natural LanguageProcessing (NLP), audio, sensing, and multimodal inputs. Spanning thisbenchmark, \sys unveils its impressive performance. It attains accuracy parityin 85\% of tasks, demonstrates improved scalability in terms of storage andmemory, and offers satisfactory inference speed on Commercial Off-The-Shelf(COTS) mobile devices fortified with NPU support. This stands in stark contrastto task-specific models tailored for individual applications.