LIP: Local Importance-based Pooling

Abstract

Spatial downsampling layers are favored in convolutional neural networks(CNNs) to downscale feature maps for larger receptive fields and less memoryconsumption. However, for discriminative tasks, there are possibilities thatthese layers lose the discriminative details due to improper poolingstrategies, which could hinder the learning process and eventually result insuboptimal models. In this paper, we present a unified framework over theexisting downsampling layers (e.g., average pooling, max pooling, and stridedconvolution) from a local importance perspective. In this framework, we analyzethe problems of these widely-used pooling layers and figure out the criteriafor designing an effective downsampling layer. According to this analysis, wepropose a conceptually simple, general, and effective pooling layer based onlocal importance modeling, termed as Local Importance-based Pooling (LIP). LIPcan automatically enhance discriminative features during the downsamplingprocedure by learning adaptive importance weights based on inputs in anend-to-end manner. Experiment results show that LIP consistently yields notablegains with different depths and different architectures on ImageNetclassification. In the challenging MS COCO dataset, detectors with ourLIP-ResNets as backbones obtain a consistent improvement ($\ge 1.4\%$) overplain ResNets, and especially achieve state-of-the-art performance in detectingsmall objects.

Quick Read (beta)

loading the full paper ...