Directional Adversarial Training for Cost Sensitive Deep Learning Classification Applications

  • 2019-10-08 15:40:09
  • Matteo Terzi, Gian Antonio Susto, Pratik Chaudhari
  • 3


In many real-world applications of Machine Learning it is of paramountimportance not only to provide accurate predictions, but also to ensure certainlevels of robustness. Adversarial Training is a training procedure aiming atproviding models that are robust to worst-case perturbations around predefinedpoints. Unfortunately, one of the main issues in adversarial training is thatrobustness w.r.t. gradient-based attackers is always achieved at the cost ofprediction accuracy. In this paper, a new algorithm, called WassersteinProjected Gradient Descent (WPGD), for adversarial training is proposed. WPGDprovides a simple way to obtain cost-sensitive robustness, resulting in a finercontrol of the robustness-accuracy trade-off. Moreover, WPGD solves an optimaltransport problem on the output space of the network and it can efficientlydiscover directions where robustness is required, allowing to control thedirectional trade-off between accuracy and robustness. The proposed WPGD isvalidated in this work on image recognition tasks with different benchmarkdatasets and architectures. Moreover, real world-like datasets are oftenunbalanced: this paper shows that when dealing with such type of datasets, theperformance of adversarial training are mainly affected in term of standardaccuracy.


Quick Read (beta)

Directional Adversarial Training for Cost Sensitive Deep Learning Classification Applications

Matteo Terzi1, Gian Antonio Susto1,2 Pratik Chaudhari3

definitionDef.Defs. \crefnameappendixAppendixAppendices \crefformatequation(#2#1#3) \crefmultiformatequation(#2#1#3) and (#2#1#3), (#2#1#3) and (#2#1#3) \crefnamesectionSec.Secs. \crefnamesubsectionSec.Secs. \crefnamesubsubsectionSec.Secs.

1 Human Inspired Technology Center, University of Padova.
2 Department of Information Engineering, University of Padova.
3 University of Pennsylvania.
Email: [email protected],[email protected], [email protected]

Abstract: In many real-world applications of Machine Learning it is of paramount importance not only to provide accurate predictions, but also to ensure certain levels of robustness. Adversarial Training is a training procedure aiming at providing models that are robust to worst-case perturbations around predefined points. Unfortunately, one of the main issues in adversarial training is that robustness w.r.t. gradient-based attackers is always achieved at the cost of prediction accuracy. In this paper, a new algorithm, called Wasserstein Projected Gradient Descent (WPGD), for adversarial training is proposed. WPGD provides a simple way to obtain cost-sensitive robustness, resulting in a finer control of the robustness-accuracy trade-off. Moreover, WPGD solves an optimal transport problem on the output space of the network and it can efficiently discover directions where robustness is required, allowing to control the directional trade-off between accuracy and robustness. The proposed WPGD is validated in this work on image recognition tasks with different benchmark datasets and architectures. Moreover, real world-like datasets are often unbalanced: this paper shows that when dealing with such type of datasets, the performance of adversarial training are mainly affected in term of standard accuracy.

Keywords: Adversarial training, Artificial Intelligence, Cost-sensitive, Deep Learning, Image Classification, Optimal Transport, Wasserstein

1. Introduction

Recent advancements in Deep Learning have lead to several breakthrough applications in many fields, like Computer Vision [24], Health-care [12], Industry 4.0 [29, 39], Natural Language Processing [52], Speech Recognition [34] and Transportation [16]. A crucial requirement for many applications in these fields, is to have models that do not have unexpected behaviors. However, Deep neural networks (DNNs), under some circumstances do not satisfy this property.

Probably the main alarming behavior of DNNs [5, 26] for classification tasks is that they are susceptible to adversarial perturbations, i.e., for example, in the context of Computer Vision, modifications to the input image that although imperceptible to the human eye cause the network to misclassify, confidently, the image [47]. These perturbations are easy to synthesize and they may even generalize across different networks [32]. This suggests surprising vulnerabilities in these state-of-the-art classifiers and it has resulted in a flurry of activities towards understanding this phenomenon [14, 43], building robustness and defenses against it [18, 28], as also discovering new attacks [3, 7, 35, 36]. Adversarial robustness is fundamental in many real-world applications; in important applications like autonomous driving [38] and predictive maintenance [46], errors and faults have different priorities and importance: for example, in autonomous driving, if a recognition system of an autonomous car misclassifies a cat as a dog there should be reasonably no damage, while, if a human is misclassified, that could lead to dramatic consequences.

Adversarial robustness is here defined as the accuracy of a given model evaluated in the worst-case input around a prescribed neighbourhood. More informally, it can be considered as the accuracy of the models in worst-case scenarios. In this context, the most common and effective approach to enable robustness to adversarial examples in DNNs is Adversarial Training [28], whose idea is to train a model with these worst-case examples, called adversarial examples instead of using clean data, ie. data measured either without error or with negligible error. Thus, it is training procedure belonging to the class of minimax problems [40], in which a inner loop finds the worst-case data point x trough gradient ascent and the outer loop minimizes the target loss on x.

Unfortunately, adversarial robustness comes at the price of lower classification accuracy on clean data: this trade-off has been demonstrated by various analyses [13, 49]. As argued above, an adversarially robust classifier with low accuracy is unlikely to be used in practical applications require both. Although much efforts has been devoted to theoretically understand robustness, its practical consequences in industrial applications received few attention from the literature [20].

The present work aims at addressing the aforementioned issues with the following contributions:

  • it is shown that the quantitative and qualitative difference between robust and standard models correlates with the visual metric of classes, ie. it is aligned with the human notion of distance between classes. Adversarially trained networks learn to (mostly) ignore fine-grained classification and confuse classes with samples that are close to the decision boundaries. This result is corroborated by [33] where it is shown that adversarial training leads to boundaries with low curvature;

  • it is shown that robust models are less confident in their predictions than standard models are;

  • inspired by the previous observation, Wasserstein Projected Gradient Descent (WPGD), an algorithm for adversarial training of deep networks, is presented here. WPGD improves the efficiency of the inner loop in gradient-based defenses such as Projected Gradient Descent (PGD). WPGD formulates an optimal transport problem on the label space with the underlying metric given by the distances of the classification boundaries between classes. This metric guides the search for adversarial perturbations towards classes that are visually dissimilar. It is shown that training deep networks using WPGD is effective in shaping boundaries to maintain direction robustness where required will maintaining accuracy on similar classes.

Moreover, it is worth noting that, although the experiments in this work regard image recognition tasks, the WPGD framework can be easily extended to other types of data such as time-series.

The rest of this paper is organized as follows. In creftype 2 the building blocks of the proposed approach, estimating the distance to the boundaries and optimal transport, are presented, while properties of adversarial training are discussed in creftype 3. In creftype 4 the WPGD algorithm is introduced and experimental results on MNIST [25], CIFAR-10 and Tiny Imagenet datasets for different deep networks are reported. Related works and discussion are provided in creftype 6 and creftype 7 respectively.

2. Notation and building blocks

This section describes the notation and the main building blocks of the approach presented in this work.

Notation: Let θd denote the parameters of a neural network. Input images are denoted by X={xi:iN} with pixel intensities normalized to lie between [0,1]. Given an image x, let κ(x){1,,K} be its ground-truth label, the one-hot encoding of κ(x) is denoted by y(x). The normalized probability distribution over the classes as predicted by the network is denoted by y^(x)K, here y^(x)k denotes its kth entry and κ^(x)=argmaxky^(x)k is the predicted class. The cross-entropy loss can then be written as

CE(θ;x)=-logy^(x)κ(x) (1)

and training a network involves minimizing the average loss, ie. argminθ𝔼xX[CE(θ;x)].

The training dataset is represented with 𝒟={𝐱,𝐲}, where 𝒙={xi}i=1N and 𝒚={yi}i=1N are, respectively, a set of randomly sampled data point and their corresponding labels generated from a unknown distribution pψ(x,y), parametrized by ψ. In lieu of minimizing the expected loss over the training data, adversarial training solves

minθ𝔼X[maxx(x)CE(x;θ)]; (2)

this is a saddle point problem where, at each iteration, candidate images x are chosen from a set (x) (or a manifold). This has been a successful approach to training neural networks robustly w.r.t. adversarial perturbations, see [28, 22, 44, 21]. In this paper only (x)=(x)={x:x-xϵ}, the infinity-norm ball around x, is considered to obtain an algorithm based on PGD [6][28].

It is remarked that the theoretical properties described in the following are generally applied to general setting and not only Euclidean perturbations. In this paper it is distinguished between natural error (NE) and adversarial error (AE) as the errors obtained with natural images and with adversarial images, respectively. In the following only is used for perturbations in all the experiments regarding real datasets while 211 1 The reason for using 2 instead of is simply to ease visualization of the impact of adversarial training. for perturbations in the synthetic example of creftype 3.4.

3. Properties of adversarial training

In this Section some effects and properties of adversarial training on various aspects are reported. Such aspects are:

  • the qualitative and quantitative description of classification errors, measured by the accuracy gap (creftype 3.1);

  • unbalanced classification problems (creftype 3.2);

  • the characterization of output confidence ( creftype 3.3);

  • the characterization of boundaries (creftype 3.4).

The aforementioned effects are supported by experiments reported in this Section. Moreover, it is shown in creftype 3.3 that an entropic regularization help in obtaining robustness.

The properties and effects of adversarial training reported here have motivated WPGD that will be presented in the following Section.

3.1. Accuracy gap

In order to ease the understanding of the results on this Section, the notion of accuracy gap is defined as the following:

Definition 1.

Let Cpgd and Cce be the confusion matrices of robust and standard models, respectively. The accuracy gap G is defined as the absolute difference between the confusion matrices:


Although it is known that robustness is obtained at cost of accuracy [28, 49], it is not still clear in the literature whether this gap can be mitigated22 2 On MNIST dataset, high capacity networks reduce the accuracy gap to near zero. However, in more complex datasets, such as CIFAR-10\xspace, this gap exists even with very large networks.. In this work a first step into tackling this problem is taken by studying how errors are distributed between images and classes: it is shown in the following that mis-classification errors are distributed following the visual metric, meaning that robust networks tend to destroy fine-grained classification. Qualitatively, the visual metric is a distance between classes that can be easily interpreted by humans. One approach for defining such visual metric is to employ the distance from boundaries of a deep neural network: in fact, [42] showed that NNs learn representations that are well-aligned with our idea of visual similarity.

Due to high-dimensionality of input, obtaining a good approximation of the visual metric is not easily feasible. However, it can be replaced by the semantic metric provided by WordNet [30], which is a good proxy for the visual metric as also showed by [9]. For MNIST, it used a linear classifier on the input pixels whereby the boundaries can computeed accurately.

Figure 1. CIFAR-10\xspacedataset. In panel (a) it is reported a matrix of pairwise distances between classes. Classes are (top to bottom): airplane, car, bird, cat, deer, dog, frog, horse, ship, truck. Panel (b) shows the accuracy gap between a Wide Residual Network [53] trained using PGD and one trained with the standard cross-entropy loss.

creftype 1 illustrates results for CIFAR-10\xspace. In particular, creftype 0(b) shows the accuracy gap between a Wide Residual Network [53] trained using PGD and one trained with the standard cross-entropy loss. From this figure, it is easy to see a visual correlation between metric and accuracy gap. Interestingly, the errors that are explained by such metrics, correspond to classes which are visually similar. For instance, creftype 1 shows a gap on the pair bird-airplane which are visually similar but semantically different. Analogously, in creftype 2,  creftype 3 and  creftype 4 it is shown the WordNet metrics and the relative accuracy gaps for MNIST, Tiny-Imagenet and CIFAR-100\xspace, respectively. Similar results are identifiable also for these datasets. In fact, regarding MNIST, not surprisingly, digits ”0” and ”1” hardly fool each other. The most similar digits are ”4” and ”9”: in fact, a small manipulation of such digits can be sufficient to make them indistinguishable. Also, regarding CIFAR-100\xspace, as an example, from indices 8-11 there is an evident cluster composed by the classes man, boy, woman and girl. Other very connected classes are bridge, skyscraper, house, castle and road. Moreover, there are animals that are semantically different but which are visually similar, such the couple 32-90 that are seal and otter respectively. The bottom-right cluster represents flowers and plants.

In creftype 1 a quantitative measure (supporting the aforementioned ’visual’ results) of the correlation between accuracy gap and relative metric is provided. The minus sign is due to the fact that confusion matrices and distances are inversely correlated: when the values of diagonal increase of the confusion matrices, then the distance between classes decreases, on average. For MNIST the correlation is higher since an approximation of the actual visual metric has been used, while for CIFAR-10\xspace and CIFAR-100\xspace the correlation is lower because some pairs, for example, bird-airplane are semantically different. Moreover, it is remarked that with high output dimension, the correlation decreases even when there are well-correlated structures. The correlation between two random matrices in 200×200 is almost zero in expectation.

MNIST CIFAR-10 CIFAR-100 Tiny-Imagenet
Correlation -0.88 - 0.65 -0.35 -0.22
Table 1. Correlation ρ between accuracy gap and relative metric for all the datasets.
(a) MNIST Visual Metric
(b) ϵ=20
(c) ϵ=38
Figure 2. MNIST dataset: Accuracy gap G between baseline model and PGD-trained model with ϵ=20 creftype 1(b) and ϵ=38 creftype 1(c). The gap in accuracy caused by PGD training correlates with the visual metric. This causes the network to be less effective in fine-grained classification.
(a) WordNet Tiny-ImagetNet\xspace
(b) Accuracy gap
(c) Confusion matrix of robust network.
Figure 3. Accuracy gap between baseline model and PGD-trained model with ϵ=8 for Tiny-ImagetNet\xspace.
(a) WordNet CIFAR-100\xspaceMetric
(b) All-CNN
(c) W-28-10
Figure 4. Accuracy gap between baseline model and PGD-trained model with ϵ=8 for CIFAR-100\xspace. The main cluster in the upper-left part of creftype 3(b) identifies animals in general.

Given these premises and observations, the following conjecture can be made: when the number of classes is high than boundaries among similar classes becomes more complex. Thus, as an ablation study, two 2-classes problems with the CIFAR-10\xspacedataset are reported in the following: the first problem is to distinguish classes airplane (id: 0) and horse (id: 7) while the second is cat (id: 3) vs dog (id: 5). In  creftype 5 it is shown that even in simple settings, adversarial training affects dramatically fine-grained classification.

3.2. Unbalanced classification

Although real-world datasets are long-tailed [11], most of the experiments and theoretical findings on the accuracy-robustness trade-off in the literature were performed with balanced datasets [50].

Through an experimental analysis, it is shown that when classes are unbalanced, adversarial training can have dramatic effects on clean accuracy. For this analysis, the same 2-classes problems of the ablation study reported in subsection 3.1 are selected: cat-vs-dog and airplane-vs-horse. Classes are artificially randomly unbalanced such that their ratio is 0.3.

The cat-vs-dog classification problem is intrinsically difficult since the two classes have many features in common. Moreover, CIFAR-10\xspacehave low-resolution images making (sometimes) this classification task not trivial also for human classifiers. On the contrary, airplane-vs-horse is a simple task and thus one should expect that adversarial training does not decrease much clean accuracy.

The results of these two experiments are shown in creftype 5: two different considerations are here reported. The first is that when classes are similar, as mentioned above, PGD heavily impacts on the performance with respect to standard training. Instead, for dissimilar classes, the effect is much less pronounced. This a solid argument for supposing that using a single ϵ may be not optimal. The second consideration is that when dataset is unbalanced, PGD further amplifies the difficulty of the classification task. For example, for cat-vs-dog (creftype 4(d)), in presence of unbalance, the model can’t be fit at all.

(a) Standard training 0-vs-7.
(b) Standard training 3-vs-5.
(c) Unbalanced 0-vs-7.
(d) Unbalanced 3-vs-5.
Figure 5. Panel (a) and Panel (b) report validation curves for the problems 0-7 and 3-5, respectively. Panel (c) and Panel (d) are the same curves with a randomly unbalanced dataset. For classes (3) and (5), when the dataset is unbalanced the validation error remains constant to random choice (50%).

3.3. Entropy of softmax outputs

One of the issues of ’standardly’ trained network, is that they are over-confident, that is, they tend to predict classes with with high probability even when images are not clear [19]. Adversarial training can be seen as an implicit regularization and thus it is legitimate to analyze confidence of predictions on robust models. Indeed, in creftype 7 it is shown that another characteristic of adversarial training is reducing confidence of predictions; in fact, the entropy of class logits of the robustly trained network is much higher. This suggests that confidence scores obtained by thresholding the softmax predictions should be changed. Thus, it may seems that robust representations are less discriminative than standard ones. 33 3 For those who are not used to deep learning language, in this context a representation is the vector (output of the feature extractor) that is feed to the last layer which is a linear classifier. It turns out that this intuition is true and supported by creftype 6. In order to assess the structure of representations, it has been employed t-SNE [27], a techniques that allows to visualize high-dimensional data in 2 or 3 dimensions. From creftype 6, it is clear that robust representations are less clustered with respect to natural ones. Each coloured cluster correspond to one particular class.

Figure 6. CIFAR-10\xspace. Comparison of t-SNE computed on the representation of standard models (left) and robust models (right). Each coloured cluster correspond to one particular class.
(a) Tiny-ImagetNet\xspace
(b) CIFAR-10\xspace
Figure 7. Entropy histograms of prediction confidence for W-16-10  with ϵ=8 of class airplane. Robust networks provide more conservative predictions. Adversarial training prevents the network to provide high confidence predictions. This is a consequence of simplifying boundaries as shown in  creftype 3.4. Other classes follow the same trend.

3.4. PGD flattens boundaries

In order to better understand the behavior of PGD and also to compare it with WPGD defined in creftype 4, a simple classification problem with three classes is considered.  creftype 8 and creftype 9 show the boundaries for PGD and WPGD (for different ϵ), respectively.  creftype 7(a) represents the standard training with which achieves almost zero error. As ϵ increases boundaries are more flattened as orthogonal as possible to the gradient direction. The adopted cost matrix is C=[0100.0110010.0110]. Related this results, [33] showed experimentally that the main effect of PGD is to reduce the curvature of boundaries. However, it can be easily shown that even when the curvature is, robust training still has an effect. Moreover, it is noticed that gradients are more aligned to the vector which connect two classes. This is due to the ”isotropic” effect of PGD which tend to estimate more isotropic distributions. This is in accordance with [50] in which authors observed that gradient on the robust model are more meaningful. This argument is also in accordance to results on fine-grained classification present on this work, suggesting that visually similar are separated by more complex boundaries. Instead, WPGD controls the the regularization of boundaries through the cost matrix: boundaries for couple of classes considered more similar are mostly preserved.

(a) Original problem
(b) PGD (ϵ=0.2)
(c) PGD (ϵ=0.4)
(d) PGD (ϵ=0.8)
Figure 8. Effect of robust 2-training on a simple classification problem. PGD training flattens the boundaries.
(a) Original problem
(b) WPGD (ϵ=0.2)
(c) WPGD (ϵ=0.4)
(d) WPGD (ϵ=0.8)
Figure 9. Effect of directional robust 2-training on a simple classification problem. WPGD flatten the boundaries where the cost is low and preserve them where the cost is high.
Remark 2.

One may find the claim that since visually similar are separated by more complex boundaries, it obviously hurts robustness. However, the range of values of ϵ used for robust training are much smaller than the minimal distance between two images in the dataset. Thus, at least in principle, it is not still clear why it is not possible to obtain robustness and accuracy at the same time.

4. Wasserstein Projected Gradient Descent

creftype 4.1 briefly reviews the the necessary background on discrete Optimal transport tools, while creftype 4.2 introduces a new formulation of directional adversarial training.

4.1. Wasserstein metric and optimal transportation

The cost between classes, referred to as label metric, is defined in the following:

Definition 3 (Label metric Cp).

A symmetric positive semi-definite matrix CR+K×K defines a pseudo-Riemannian metric on the domain, an entry Ck,k is the cost of transporting unit probability mass from class k to class k. Note that Ck,k=0. The notation Cp denotes the element-wise pth-power of C.

The other building blocks are the optimal transportation problem [41] and the Wasserstein metric over probability distributions. Given two probability distributions q,q supported on K classes, the p-Wasserstein distance between q and q for p[1,) is defined to be

Wpp(q,q)=infπΠ(q,q)π,Cp (3)

where Π(q,q)={π+K×K:q=π𝟙,q=π𝟙} is the set of joint probability distributions with q as the right marginal and q as the left marginal; 𝟙 denotes the all-one vector and , is the Frobenius inner product on matrices. The Wasserstein distance is the optimal cost of transporting probability mass from an initial distribution q to a final distribution q. For 0<p1, the Wasserstein distance in creftype 3 is defined to be Wp(q,q)=infπΠ(q,q)π,Cp; note the absence of pth power on the left-hand side. For any separable complete metric space (𝒳,d) and p>0, the metric space (𝒫p,Wp) is complete and separable, 𝒫p being the set of probability distributions supported on 𝒳 [2].

Problem creftype 3 is called the Kantorovich relaxation [41] of the original optimal transport problem with Π=+K×K [31] and it takes 𝒪(K3) operations to solve it using linear programming or interior point methods. [8] proposed a smoothed alternative to creftype 3 by adding a convex negative entropic term

Wppλ(q,q)=infπΠ(q,q)π,Cp-λ-1H(π), (4)

H(π)=-k,k=1Kπk,klogπk,k that enables an efficient algorithm based on Sinhorn-Knopp iteration [45] to approximate π*. Large values of λ give better approximation to the exact distance Wpp and it can be shown that Wppλ converges to Wpp as λ [37].

Sinhorn-Knopp iteration is a costly algorithm if the number of classes K is large or the metric Cp is complex. However as the following lemma shows, if one of the probability distributions is a one-hot vector, one can compute the optimal transport π* in closed form. Indeed, in this paper, the p-Wasserstein distance is computed between the ground-truth y(x) and the network predictions y^(x), the former being a one-hot vector.

Lemma 4 (Closed-form Wasserstein distance).

For any normalized q, if the target probability distribution q is a one-hot vector, the Wasserstein distance Wpp can be computed in closed form and is given by


where κ*=argmaxkq. The optimal transport is such that its (κ*)th column is q.

The proof of this lemma follows from the observation that the set Π(q,q) is degenerate for one-hot q, the constraints π𝟙=q and π𝟙=q force the (κ*)th column of π to be simply q. Note that the Wasserstein distance is symmetric and therefore the same statement holds for Wpp(q,q). Finally, the regularized [8] Wasserstein Loss is defined as follows:

Definition 5 (Wasserstein Loss).

The Wasserstein Loss can now be defined as

W(θ;x)=Cκ(x)py^(x)-λ-1logKH(y^(x)); (5)

here Cy(x)p denotes the y(x)th row of the matrix CpR+K×K. Note that computing W(θ;x) and back-propagating through it has the same computational complexity as standard cross-entropy.

4.2. WPGD

The saddle point formulation for the Wasserstein loss creftype 5 can be modified to lead to the following definition.

Definition 6 (Robust Wasserstein loss).

The Robust Wasserstein loss is defined as

minθ𝔼XCE(x*;θ),x*=argmaxx-xϵW(θ;x) (6)

The outer loop remains the same while the inner loop is responsible to find the adversarial example which maximize the Wasserstein loss W. This implies that at the beginning of training WPGD will prefer directions connecting visually distant classes, such as, cat and truck, preventing to flatten regions between similar classes. It is important to note that during training there is an implicit trade-off between choosing directions suggested by the metrics and gradients directions. In fact, the loss gradient is nothing else that an inner product of the K logit’s gradients and the the row k-th row of C. Imposing an approximation of the real visual metric, helps to efficiently explore the -ball which, especially for high-dimensional input can be hard to explore, leading to better results. For WPGD experiments, the metrics previously described will be used.

5. Experiments

This Section provides the experimental findings of the WPGD approach.

5.1. Datasets and networks

In this paper, the MNIST [25], CIFAR-10\xspace, CIFAR-100\xspacedatasets [23] and Tiny-ImagetNet\xspace [1] dataset are used for the experiments. For all datasets, images are normalized to have pixel intensities between [0,1]. The adversarial vulnerability of neural networks increases with the number of output classes [13]. In this context, is it worth emphasizing that the Tiny-ImagetNet\xspace dataset with 200 classes is a viable dataset for benchmarking adversarial learning algorithms; this dataset is however less popular in the literature which primarily focuses on MNIST and CIFAR-10\xspace. For the CIFAR datasets, it is used standard data-augmentation which involves mirror flipping with probability of 0.5 and random crops of size 32×32 after padding images by 4 pixels on each side. The following networks are used in all the experiments:

  1. (1)

    W-16-10: Wide-Residual network of [53] with 16 layers, a widening factor of 10, weight decay of 5×10-4 and zero dropout.

  2. (2)

    W-40-10: Wide-Residual network of [53] with 50 layers, a widening factor of 10, weight decay of 5×10-4 and zero dropout.

  3. (3)

    W-28-10: Wide-Residual network of [53] with 28 layers, a widening factor of 10, weight decay of 5×10-4 and zero dropout.

All networks are trained with stochastic gradient descent (SGD), Nesterov’s momentum of 0.9 and mini-batch size of 128.

5.2. Algorithms

The following four algorithms will be compared:

  1. (1)

    CE: This is the standard cross-entropy loss CE defined in creftype 1.

  2. (2)

    PGD: This is the algorithm of [28]; the saddle-point problem creftype 2 is solved with 8 steps in the inner loop to compute the adversarial image.

  3. (3)

    WPGD: This is the robust Wasserstein loss described in creftype 6 where the inner loop in PGD searches over the adversarial image that maximizes the Wasserstein transport cost. The computational complexity of WPGD is the same as that of PGD. WPGD is compared with three different value of p=1,2.5,10.

W-s-10 represents the wideresnet architecture with s layers. In order to test robustness, 20-steps PGD attacks are performed starting from a random (uniformly sampled) position inside the ball of the test image x. All the WPGD experiments are run with the cost matrix provided by the WordNet metric [30].

5.3. Directional robustness of WPGD

In creftype 2 the main results of natural training (CE) and robust training (PGD) for CIFAR-10\xspace and Tiny-ImagetNet\xspaceare reported.  creftype 5.4 reports a summary table for quantitative results on directional robustness. Instead, in creftype 10 it is shown the trade-off arising from WPGD training. As p increases, fine-grained classification is more preserved. In addition to standard accuracy, the characterizations of adversarial robustness of PGD and WPGD is compared. In creftype 11 it shown that WPGD-trained networks with a strong metric tend to be more robust between visually distant classes, which supports out claims. For sake of clarity, only results for CIFAR-10\xspace and Tiny-ImagetNet\xspaceand W-16-10 are reported. Interestingly, WPGD is less robust than PGD for classes bird and airplane: thus, imposing a metric, even if it is only approximately correct, seems to help to obtain more visually meaningful errors.

16-10 28-10 16-10 28-10
NE 4.4 3.9 14.11 13.9
AE 100 100 34.5 31.25
16-10 28-10 16-10 28-10
NE 37.7 36.9 55.3 36.9
AE 99.9 100 70.4 70.5
Table 2. Summary of errors in [%] for W-16-10 and W-28-10, with ϵ=4 and k=20 under perturbations.
Figure 10. Accuracy & robustness trade-off: Results for W-16-10 and ϵ=4, for CIFAR-10\xspace (left) and Tiny-ImagetNet\xspace (right). Increasing p (x-axis), enables to improve accuracy on fine-grained classification at the price of robustness on pairs of similar classes
(a) PGD
(b) WPGD
Figure 11. CIFAR-10\xspace. Characterization of adversarial robustness for WPGD and PGD defenses. Applied perturbations have norm ϵ=16 on W-28-10. WPGD is performed using p=2.5creftype 10(b) shows the WPGD obtains directionally robustness: in this case the network is more robust for perturbations between two visually different classes.

5.4. Supplementary comparisons for CE, PGD and WPGD

creftype 12 report curves plot for PGD and WPGD for CIFAR-10\xspaceand Tiny-ImagetNet\xspace. Moreover, creftype 3 reports the summary of weighted robustness score S defined as:

S=i,jci,jmi,j (7)

where M={mij}i,j=1K is the adversarial confusion matrix, C={cij}i,j=1K is the metric of the given dataset. Attacks are computed maximizing the loss creftype 6, that is considering the worst-case scenario in which the attacker knows the metric. This score weighs more errors in correspondence of high cost. In order to make results legible, the zero reference is set to the PGD-trained model. As it can be seen increasing p, results in reducing the score S, which means that, on average, more similar classes are reached.

Figure 12. CIFAR-10\xspace. Comparisons: (Left) standard training vs PGD; (Middle) different architectures on robust training; (Right) PGD vs WPGD. WPGD is slightly better in terms of accuracy.
Figure 13. Tiny-ImagetNet\xspace. Comparisons: (Top) standard training vs PGD; (Middle) different architectures on robust training; (Bottom) PGD vs WPGD. WPGD is slightly better in terms of accuracy.
AE [%] p dataset S
W-16-10 34.53 0.0 CIFAR-10\xspace -0.14
W-16-10 34.62 1.0 CIFAR-10\xspace -0.26
W-16-10 34.98 2.5 CIFAR-10\xspace -0.34
W-16-10 39.76 10.0 CIFAR-10\xspace -0.53
W-28-10 31.24 0.0 CIFAR-10\xspace 0.00
W-16-10 70.23 1.0 Tiny-ImagetNet\xspace -6.33
W-16-10 73.61 2.5 Tiny-ImagetNet\xspace -12.45
W-16-10 92.62 10.0 Tiny-ImagetNet\xspace -55.17
W-28-10 69.84 1.0 Tiny-ImagetNet\xspace -9.62
W-28-10 69.69 0.0 Tiny-ImagetNet\xspace -9.48
Table 3. Summary of weighted robustness score S defined in creftype 7 for ϵ=4. In order to make results more legible, the zero reference (for each dataset) is set to the PGD-trained model. As it can be seen, increasing p results in reducing the score S, which means that, on average, more similar classes are reached.

6. Related work

This work is related to [28, 48]. Although they give theoretical and practical results on the connection between robustness and accuracy for adversarial training, they don’t analyze how the accuracy gap is distributed. They also argue that adversarial training requires extra capacity in order to build complex boundaries [22]. In contrast, [33] have recently argued that adversarial training leads to flatter decision boundaries and in fact, explicitly penalizing the curvature of the decision boundary is a good technique to train robust classifiers. Results in this paper corroborate these findings. The accuracy gap of adversarially trained networks with respect to standard cross-entropy trained networks can be explained, very well the experiments show, by the network getting these pairs of classes incorrect. Semantic metrics, e.g., those derived from WordNet [30] to aid visual classification have been popular to introduce a new data-modality in standard supervised learning [9, 10]. This paper identifies the inherent visual metric that the network induces while being trained using cross-entropy loss or the adversarial loss. Lastly, using an optimal transport formulation to impose a metric on the label space of deep networks bears close resemblance to the work of [15]. This work uses the Wasserstein loss computed using the Sinhorn-Knopp iteration to predict multi-label images. The present paper is the first to use the optimal transport formulation to induce a cost-sensitive adversarial training of deep networks. Further, for single-label images, it shown that the optimal transport problem has a closed form solution which makes it computationally equivalent to the cross-entropy loss; this simple but powerful property may be of independent interest for problems like hierarchical classification [17, 51, 4].

7. Conclusions and future work

While the literature on adversarial training is flourishing, profound studies towards understanding its implication and sensitivity to common real-world applications are still lacking. In particular, this paper focused on applications that are cost-sensitive or the dataset is unbalanced. Moreover, due to an intrinsic trade-off between robustness and accuracy, it is of paramount importance to be to govern such trade-off when designing and implementing machine and deep learning-based applications where a certain amount of accuracy is required. In liue of this, the present paper made several advances towards understanding better robustness from one side and being able to semantically control it from the other side.

In particular, this paper identified that the accuracy gap in adversarial training comes from the loss of fine-grained classification capabilities in neural networks. This observation motivates the optimal transport formulation: a metric on the label space that measures the distance to the boundary for standard cross-entropy training or, often equivalently, a semantic metric obtained from external data modalities such as WordNet, reduces the search space and makes it easier to discover—and fix—these classes during adversarial training, resulting in an improvement of accuracy at the cost of (directional) robustness. It is conceivable that, although a high-dimensional classifier may always remain vulnerable to adversarial perturbations, it is possible to build robust, real-world systems by incorporating such diverse data. Thus, this work is a first step toward a principled robust training for real-world applications involving artificial intelligence and deep learning.

Future works will regard the study of methodologies or heuristics to systematically control the robustness-accuracy trade-off without the need of tuning ϵ by hyper-parameter tuning. Moreover, another future direction of research is the application of the WPGD approach to other problems like fraud detection and Predictive Maintenance.


The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan V GPU used for this research and Amazon Web Services for donating research credits.




  • [1] Tinyimagenet.
  • [2] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2008.
  • [3] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv:1802.00420, 2018.
  • [4] Hessam Bagherinezhad, Maxwell Horton, Mohammad Rastegari, and Ali Farhadi. Label refinery: Improving imagenet classification through label progression. arXiv:1805.02641, 2018.
  • [5] Aykut Beke and Tufan Kumbasar. Learning with type-2 fuzzy activation functions to improve the performance of deep neural networks. Engineering Applications of Artificial Intelligence, 85:372–384, 2019.
  • [6] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
  • [7] Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 3–14. ACM, 2017.
  • [8] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in neural information processing systems, pages 2292–2300, 2013.
  • [9] Jia Deng, Alexander C. Berg, Kai Li, and Li Fei-Fei. What does classifying more than 10,000 image categories tell us? Lecture Notes in Computer Science, pages 71–84, 2010.
  • [10] Jia Deng, Nan Ding, Yangqing Jia, Andrea Frome, Kevin Murphy, Samy Bengio, Yuan Li, Hartmut Neven, and Hartwig Adam. Large-scale object classification using label relation graphs. Lecture Notes in Computer Science, pages 48–64, 2014.
  • [11] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. Ieee, 2009.
  • [12] Andre Esteva, Alexandre Robicquet, Bharath Ramsundar, Volodymyr Kuleshov, Mark DePristo, Katherine Chou, Claire Cui, Greg Corrado, Sebastian Thrun, and Jeff Dean. A guide to deep learning in healthcare. Nature medicine, 25(1):24, 2019.
  • [13] Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier. arXiv:1802.08686, 2018.
  • [14] Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Analysis of classifiers robustness to adversarial perturbations. Machine Learning, 107(3):481–508, 2018.
  • [15] Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya, and Tomaso A Poggio. Learning with a wasserstein loss. In Advances in Neural Information Processing Systems, pages 2053–2061, 2015.
  • [16] Zehai Gao, Cunbao Ma, Yige Luo, and Zhiyue Liu. Ima health state evaluation using deep feature learning with quantum neural network. Engineering Applications of Artificial Intelligence, 76:119–129, 2018.
  • [17] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. 11 2013.
  • [18] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv:1412.6572, 2014.
  • [19] Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1321–1330. JMLR. org, 2017.
  • [20] Olakunle Ibitoye, Omair Shafiq, and Ashraf Matrawy. Analyzing adversarial attacks against deep learning for intrusion detection in iot networks. arXiv preprint arXiv:1905.05137, 2019.
  • [21] Harini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial logit pairing. arXiv preprint arXiv:1803.06373, 2018.
  • [22] J Zico Kolter and Eric Wong. Provable defenses against adversarial examples via the convex outer adversarial polytope. arXiv preprint arXiv:1711.00851, 2017.
  • [23] A. Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, Computer Science, University of Toronto, 2009.
  • [24] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
  • [25] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  • [26] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
  • [27] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008.
  • [28] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083, 2017.
  • [29] Marco Maggipinto, Matteo Terzi, Chiara Masiero, Alessandro Beghi, and Gian Antonio Susto. A computer vision-inspired deep learning architecture for virtual metrology modeling with 2-dimensional data. IEEE Transactions on Semiconductor Manufacturing, 31(3):376–384, 2018.
  • [30] George A Miller. WordNet: a lexical database for English. Communications of the ACM, 38(11):39–41, 1995.
  • [31] Gaspard Monge. Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences de Paris, 1781.
  • [32] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. arXiv:1610.08401, 2017.
  • [33] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Jonathan Uesato, and Pascal Frossard. Robustness via curvature regularization, and vice versa. 11 2018.
  • [34] Ali Bou Nassif, Ismail Shahin, Imtinan Attili, Mohammad Azzeh, and Khaled Shaalan. Speech recognition using deep neural networks: A systematic review. IEEE Access, 7:19143–19165, 2019.
  • [35] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv:1605.07277, 2016.
  • [36] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against deep learning systems using adversarial examples. arXiv:1602.02697, 2016.
  • [37] Gabriel Peyré and Marco Cuturi. Computational optimal transport. arXiv:1803.00567, 2018.
  • [38] Adnan Qayyum, Muhammad Usama, Junaid Qadir, and Ala Al-Fuqaha. Securing connected & autonomous vehicles: Challenges posed by adversarial machine learning and the way forward. arXiv preprint arXiv:1905.12762, 2019.
  • [39] Xing Qi. Rotor resistance and excitation inductance estimation of an induction motor using deep-q-learning algorithm. Engineering Applications of Artificial Intelligence, 72:67–79, 2018.
  • [40] Hassan Rafique, Mingrui Liu, Qihang Lin, and Tianbao Yang. Non-convex min-max optimization: Provable algorithms and applications in machine learning, 2018.
  • [41] Filippo Santambrogio. Optimal transport for applied mathematicians. Birkäuser, NY, 2015.
  • [42] Andrew M Saxe, James L McClelland, and Surya Ganguli. A mathematical theory of semantic development in deep neural networks. Proceedings of the National Academy of Sciences, 116(23):11537–11546, 2019.
  • [43] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. arXiv:1804.11285, 2018.
  • [44] Aman Sinha, Hongseok Namkoong, and John Duchi. Certifying some distributional robustness with principled adversarial training. 2018.
  • [45] Richard Sinkhorn. A relationship between arbitrary positive matrices and doubly stochastic matrices. The annals of mathematical statistics, 35(2):876–879, 1964.
  • [46] Gian Antonio Susto, Andrea Schirru, Simone Pampuri, Seán McLoone, and Alessandro Beghi. Machine learning for predictive maintenance: A multiple classifier approach. IEEE Transactions on Industrial Informatics, 11(3):812–820, 2014.
  • [47] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv:1312.6199, 2013.
  • [48] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. 05 2018.
  • [49] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. There is no free lunch in adversarial robustness (but there are unexpected benefits). arXiv:1805.12152, 2018.
  • [50] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2019.
  • [51] Cinna Wu, Mark Tygert, and Yann LeCun. Hierarchical loss for classification. arXiv:1709.01062, 2017.
  • [52] Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. Recent trends in deep learning based natural language processing. ieee Computational intelligenCe magazine, 13(3):55–75, 2018.
  • [53] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.