Designing convolutional neural networks (CNN) models for mobile devices ischallenging because mobile models need to be small and fast, yet stillaccurate. Although significant effort has been dedicated to design and improvemobile models on all three dimensions, it is challenging to manually balancethese trade-offs when there are so many architectural possibilities toconsider. In this paper, we propose an automated neural architecture searchapproach for designing resource-constrained mobile CNN models. We propose toexplicitly incorporate latency information into the main objective so that thesearch can identify a model that achieves a good trade-off between accuracy andlatency. Unlike in previous work, where mobile latency is considered viaanother, often inaccurate proxy (e.g., FLOPS), in our experiments, we directlymeasure real-world inference latency by executing the model on a particularplatform, e.g., Pixel phones. To further strike the right balance betweenflexibility and search space size, we propose a novel factorized hierarchicalsearch space that permits layer diversity throughout the network. Experimentalresults show that our approach consistently outperforms state-of-the-art mobileCNN models across multiple vision tasks. On the ImageNet classification task,our model achieves 74.0% top-1 accuracy with 76ms latency on a Pixel phone,which is 1.5x faster than MobileNetV2 (Sandler et al. 2018) and 2.4x fasterthan NASNet (Zoph et al. 2018) with the same top-1 accuracy. On the COCO objectdetection task, our model family achieves both higher mAP quality and lowerlatency than MobileNets.