Abstract
Neural Architecture Search (NAS) has been widely adopted to design accurateand efficient image classification models. However, applying NAS to a newcomputer vision task still requires a huge amount of effort. This is because 1)previous NAS research has been over-prioritized on image classification whilelargely ignoring other tasks; 2) many NAS works focus on optimizingtask-specific components that cannot be favorably transferred to other tasks;and 3) existing NAS methods are typically designed to be "proxyless" andrequire significant effort to be integrated with each new task's trainingpipelines. To tackle these challenges, we propose FBNetV5, a NAS framework thatcan search for neural architectures for a variety of vision tasks with muchreduced computational cost and human effort. Specifically, we design 1) asearch space that is simple yet inclusive and transferable; 2) a multitasksearch process that is disentangled with target tasks' training pipeline; and3) an algorithm to simultaneously search for architectures for multiple taskswith a computational cost agnostic to the number of tasks. We evaluate theproposed FBNetV5 targeting three fundamental vision tasks -- imageclassification, object detection, and semantic segmentation. Models searched byFBNetV5 in a single run of search have outperformed the previousstateof-the-art in all the three tasks: image classification (e.g., +1.3%ImageNet top-1 accuracy under the same FLOPs as compared to FBNetV3), semanticsegmentation (e.g., +1.8% higher ADE20K val. mIoU than SegFormer with 3.6xfewer FLOPs), and object detection (e.g., +1.1% COCO val. mAP with 1.2x fewerFLOPs as compared to YOLOX).