Cross-Input Certified Training for Universal Perturbations

Abstract

Existing work in trustworthy machine learning primarily focuses onsingle-input adversarial perturbations. In many real-world attack scenarios,input-agnostic adversarial attacks, e.g. universal adversarial perturbations(UAPs), are much more feasible. Current certified training methods train modelsrobust to single-input perturbations but achieve suboptimal clean and UAPaccuracy, thereby limiting their applicability in practical applications. Wepropose a novel method, CITRUS, for certified training of networks robustagainst UAP attackers. We show in an extensive evaluation across differentdatasets, architectures, and perturbation magnitudes that our methodoutperforms traditional certified training methods on standard accuracy (up to10.3\%) and achieves SOTA performance on the more practical certified UAPaccuracy metric.

Quick Read (beta)

loading the full paper ...