Abstract
We propose a general method to train a single convolutional neural networkwhich is capable of switching image resolutions at inference. Thus the runningspeed can be selected to meet various computational resource limits. Networkstrained with the proposed method are named Resolution Switchable Networks(RS-Nets). The basic training framework shares network parameters for handlingimages which differ in resolution, yet keeps separate batch normalizationlayers. Though it is parameter-efficient in design, it leads to inconsistentaccuracy variations at different resolutions, for which we provide a detailedanalysis from the aspect of the train-test recognition discrepancy. Amulti-resolution ensemble distillation is further designed, where a teacher islearnt on the fly as a weighted ensemble over resolutions. Thanks to theensemble and knowledge distillation, RS-Nets enjoy accuracy improvements at awide range of resolutions compared with individually trained models. Extensiveexperiments on the ImageNet dataset are provided, and we additionally considerquantization problems. Code and models are available athttps://github.com/yikaiw/RS-Nets.