TResNet: High Performance GPU-Dedicated Architecture

Abstract

Many deep learning models, developed in recent years, reach higher ImageNetaccuracy than ResNet50, with fewer or comparable FLOPS count. While FLOPs areoften seen as a proxy for network efficiency, when measuring actual GPUtraining and inference throughput, vanilla ResNet50 is usually significantlyfaster than its recent competitors, offering better throughput-accuracytrade-off. In this work, we introduce a series of architecture modificationsthat aim to boost neural networks' accuracy, while retaining their GPU trainingand inference efficiency. We first demonstrate and discuss the bottlenecksinduced by FLOPs-optimizations. We then suggest alternative designs that betterutilize GPU structure and assets. Finally, we introduce a new family ofGPU-dedicated models, called TResNet, which achieve better accuracy andefficiency than previous ConvNets. Using a TResNet model, with similar GPUthroughput to ResNet50, we reach 80.7% top-1 accuracy on ImageNet. Our TResNetmodels also transfer well and achieve state-of-the-art accuracy on competitivedatasets such as Stanford cars (96.0%), CIFAR-10 (99.0%), CIFAR-100 (91.5%) andOxford-Flowers (99.1%). Implementation is available at:https://github.com/mrT23/TResNet

Quick Read (beta)

loading the full paper ...