Machine learning on tiny IoT devices based on microcontroller units (MCU) isappealing but challenging: the memory of microcontrollers is 2-3 orders ofmagnitude smaller even than mobile phones. We propose MCUNet, a framework thatjointly designs the efficient neural architecture (TinyNAS) and the lightweightinference engine (TinyEngine), enabling ImageNet-scale inference onmicrocontrollers. TinyNAS adopts a two-stage neural architecture searchapproach that first optimizes the search space to fit the resource constraints,then specializes the network architecture in the optimized search space.TinyNAS can automatically handle diverse constraints (i.e.device, latency,energy, memory) under low search costs.TinyNAS is co-designed with TinyEngine,a memory-efficient inference library to expand the search space and fit alarger model. TinyEngine adapts the memory scheduling according to the overallnetwork topology rather than layer-wise optimization, reducing the memory usageby 4.8x, and accelerating the inference by 1.7-3.3x compared to TF-Lite Microand CMSIS-NN. MCUNet is the first to achieves >70% ImageNet top1 accuracy on anoff-the-shelf commercial microcontroller, using 3.5x less SRAM and 5.7x lessFlash compared to quantized MobileNetV2 and ResNet-18. On visual&audio wakewords tasks, MCUNet achieves state-of-the-art accuracy and runs 2.4-3.4x fasterthan MobileNetV2 and ProxylessNAS-based solutions with 3.7-4.1x smaller peakSRAM. Our study suggests that the era of always-on tiny machine learning on IoTdevices has arrived. Code and models can be found here: https://tinyml.mit.edu.