Representing features at multiple scales is of great importance for numerousvision tasks. Recent advances in backbone convolutional neural networks (CNNs)continually demonstrate stronger multi-scale representation ability, leading toconsistent performance gains on a wide range of applications. However, mostexisting methods represent the multi-scale features in a layer-wise manner. Inthis paper, we propose a novel building block for CNNs, namely Res2Net, byconstructing hierarchical residual-like connections within one single residualblock. The Res2Net represents multi-scale features at a granular level andincreases the range of receptive fields for each network layer. The proposedRes2Net block can be plugged into the state-of-the-art backbone CNN models,e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all thesemodels and demonstrate consistent performance gains over baseline models onwidely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studiesand experimental results on representative computer vision tasks, i.e., objectdetection, class activation mapping, and salient object detection, furtherverify the superiority of the Res2Net over the state-of-the-art baselinemethods. The source code and trained models will be made publicly available.