Abstract
Deep learning models have achieved state-of-the-art performances in variousdomains, while they are vulnerable to the inputs with well-crafted but smallperturbations, which are named after adversarial examples (AEs). Among manystrategies to improve the model robustness against AEs, Projected GradientDescent (PGD) based adversarial training is one of the most effective methods.Unfortunately, the prohibitive computational overhead of generating strongenough AEs, due to the maximization of the loss function, sometimes makes theregular PGD adversarial training impractical when using larger and morecomplicated models. In this paper, we propose that the adversarial loss can beapproximated by the partial sum of Taylor series. Furthermore, we approximatethe gradient of adversarial loss and propose a new and efficient adversarialtraining method, adversarial training with gradient approximation (GAAT), toreduce the cost of building up robust models. Additionally, extensiveexperiments demonstrate that this efficiency improvement can be achievedwithout any or with very little loss in accuracy on natural and adversarialexamples, which show that our proposed method saves up to 60\% of the trainingtime with comparable model test accuracy on MNIST, CIFAR-10 and CIFAR-100datasets.