Abstract
Model efficiency has become increasingly important in computer vision. Inthis paper, we systematically study neural network architecture design choicesfor object detection and propose several key optimizations to improveefficiency. First, we propose a weighted bi-directional feature pyramid network(BiFPN), which allows easy and fast multi-scale feature fusion; Second, wepropose a compound scaling method that uniformly scales the resolution, depth,and width for all backbone, feature network, and box/class prediction networksat the same time. Based on these optimizations and EfficientNet backbones, wehave developed a new family of object detectors, called EfficientDet, whichconsistently achieve much better efficiency than prior art across a widespectrum of resource constraints. In particular, with single-model andsingle-scale, our EfficientDet-D7 achieves state-of-the-art 52.2 AP on COCOtest-dev with 52M parameters and 325B FLOPs, being 4x - 9x smaller and using13x - 42x fewer FLOPs than previous detectors. Code is available athttps://github.com/google/automl/tree/master/efficientdet.