Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models

  • 2024-10-16 14:48:37
  • Fan Yang, Yihao Huang, Kailong Wang, Ling Shi, Geguang Pu, Yang Liu, Haoyu Wang
  • 0

Abstract

Vision-language pre-training (VLP) models, trained on large-scale image-textpairs, have become widely used across a variety of downstreamvision-and-language (V+L) tasks. This widespread adoption raises concerns abouttheir vulnerability to adversarial attacks. Non-universal adversarial attacks,while effective, are often impractical for real-time online applications due totheir high computational demands per data instance. Recently, universaladversarial perturbations (UAPs) have been introduced as a solution, butexisting generator-based UAP methods are significantly time-consuming. Toovercome the limitation, we propose a direct optimization-based UAP approach,termed DO-UAP, which significantly reduces resource consumption whilemaintaining high attack performance. Specifically, we explore the necessity ofmultimodal loss design and introduce a useful data augmentation strategy.Extensive experiments conducted on three benchmark VLP datasets, six popularVLP models, and three classical downstream tasks demonstrate the efficiency andeffectiveness of DO-UAP. Specifically, our approach drastically decreases thetime consumption by 23-fold while achieving a better attack performance.

 

Quick Read (beta)

loading the full paper ...