Semantic segmentation in very high resolution (VHR) aerial images is one ofthe most challenging tasks in remote sensing image understanding. Most of thecurrent approaches are based on deep convolutional neural networks (DCNNs) forits remarkable ability of feature representations. Specifically,attention-based methods can effectively capture long-range dependencies andfurther reconstruct the feature maps for better representation. However,limited by the mere perspective of spacial and channel attention and hugecomputation complexity of self-attention mechanism, it's unlikely to model theeffective semantic interdependencies between each pixel-pair. In this work, wepropose a novel attention-based framework named Hybrid Multiple AttentionNetwork (HMANet) to adaptively capture global correlations from the perspectiveof space, channel and category in a more effective and efficient manner.Concretely, a class augmented attention (CAA) module embedded with a classchannel attention (CCA) module can be used to compute category-basedcorrelation and recalibrate the class-level information. Additionally, weintroduce a simple yet region shuffle attention (RSA) module to reduce featureredundant and improve the efficiency of self-attention mechanism viaregion-wise representations. Extensive experimental results on the ISPRSVaihingen and Potsdam benchmark demonstrate the effectiveness and efficiency ofour HMANet over other state-of-the-art methods.