Abstract
Object detection has been a challenging task in computer vision. Althoughsignificant progress has been made in object detection with deep neuralnetworks, the attention mechanism is far from development. In this paper, wepropose the hybrid attention mechanism for single-stage object detection.First, we present the modules of spatial attention, channel attention andaligned attention for single-stage object detection. In particular, stackeddilated convolution layers with symmetrically fixed rates are constructed tolearn spatial attention. The channel attention is proposed with the cross-levelgroup normalization and squeeze-and-excitation module. Aligned attention isconstructed with organized deformable filters. Second, the three kinds ofattention are unified to construct the hybrid attention mechanism. We thenembed the hybrid attention into Retina-Net and propose the efficientsingle-stage HAR-Net for object detection. The attention modules and theproposed HAR-Net are evaluated on the COCO detection dataset. Experimentsdemonstrate that hybrid attention can significantly improve the detectionaccuracy and the HAR-Net can achieve the state-of-the-art 45.8\% mAP,outperform existing single-stage object detectors.