We address a learning-to-normalize problem by proposing SwitchableNormalization (SN), which learns to select different operations for differentnormalization layers of a deep neural network (DNN). The statistics (means andvariances) of SN are computed in three scopes, including a channel, a layer,and a minibatch. SN selects them by learning their importance weights in anend-to-end manner. It has several appealing benefits. First, SN adapts tovarious network architectures and tasks. Second, it is robust to a wide rangeof batch sizes, maintaining high performance when small minibatch is presented(e.g. 2 images/GPU). Third, the above benefits of SN are obtained by treatingall channels as a group, unlike group normalization that searches the number ofgroups as the hyper-parameter. Extensive evaluations demonstrate that SNoutperforms its counterparts on various challenging problems, such as imageclassification in ImageNet, object detection and segmentation in COCO, artisticimage stylization, and neural architecture search. The code of SN will be madeavailable in https://github.com/switchablenorms/.