Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

Abstract

Large batch size training of Neural Networks has been shown to incur accuracyloss when trained with the current methods. The precise underlying reasons forthis are still not completely understood. Here, we study large batch sizetraining through the lens of the Hessian operator and robust optimization. Inparticular, we perform a Hessian based study to analyze how the landscape ofthe loss functional is different for large batch size training. We compute thetrue Hessian spectrum, without approximation, by back-propagating the secondderivative. Our results on multiple networks show that, when training at largebatch sizes, one tends to stop at points in the parameter space with noticeablyhigher/larger Hessian spectrum, i.e., where the eigenvalues of the Hessian aremuch larger. We then study how batch size affects robustness of the model inthe face of adversarial attacks. All the results show that models trained withlarge batches are more susceptible to adversarial attacks, as compared tomodels trained with small batch sizes. Furthermore, we prove a theoreticalresult which shows that the problem of finding an adversarial perturbation is asaddle-free optimization problem. Finally, we show empirical results thatdemonstrate that adversarial training leads to areas with smaller Hessianspectrum. We present detailed experiments with five different networkarchitectures tested on MNIST, CIFAR-10, and CIFAR-100 datasets.

Quick Read (beta)

loading the full paper ...