Safety at Scale: A Comprehensive Survey of Large Model Safety

  • 2025-03-19 17:10:18
  • Xingjun Ma, Yifeng Gao, Yixu Wang, Ruofan Wang, Xin Wang, Ye Sun, Yifan Ding, Hengyuan Xu, Yunhao Chen, Yunhan Zhao, Hanxun Huang, Yige Li, Jiaming Zhang, Xiang Zheng, Yang Bai, Zuxuan Wu, Xipeng Qiu, Jingfeng Zhang, Yiming Li, Xudong Han, Haonan Li, Jun Sun, Cong Wang, Jindong Gu, Baoyuan Wu, Siheng Chen, Tianwei Zhang, Yang Liu, Mingming Gong, Tongliang Liu, Shirui Pan, Cihang Xie, Tianyu Pang, Yinpeng Dong, Ruoxi Jia, Yang Zhang, Shiqing Ma, Xiangyu Zhang, Neil Gong, Chaowei Xiao, Sarah Erfani, Tim Baldwin, Bo Li, Masashi Sugiyama, Dacheng Tao, James Bailey, Yu-Gang Jiang
  • 0

Abstract

The rapid advancement of large models, driven by their exceptional abilitiesin learning and generalization through large-scale pre-training, has reshapedthe landscape of Artificial Intelligence (AI). These models are nowfoundational to a wide range of applications, including conversational AI,recommendation systems, autonomous driving, content generation, medicaldiagnostics, and scientific discovery. However, their widespread deploymentalso exposes them to significant safety risks, raising concerns aboutrobustness, reliability, and ethical implications. This survey provides asystematic review of current safety research on large models, covering VisionFoundation Models (VFMs), Large Language Models (LLMs), Vision-LanguagePre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models(DMs), and large-model-based Agents. Our contributions are summarized asfollows: (1) We present a comprehensive taxonomy of safety threats to thesemodels, including adversarial attacks, data poisoning, backdoor attacks,jailbreak and prompt injection attacks, energy-latency attacks, data and modelextraction attacks, and emerging agent-specific threats. (2) We review defensestrategies proposed for each type of attacks if available and summarize thecommonly used datasets and benchmarks for safety research. (3) Building onthis, we identify and discuss the open challenges in large model safety,emphasizing the need for comprehensive safety evaluations, scalable andeffective defense mechanisms, and sustainable data practices. More importantly,we highlight the necessity of collective efforts from the research communityand international collaboration. Our work can serve as a useful reference forresearchers and practitioners, fostering the ongoing development ofcomprehensive defense systems and platforms to safeguard AI models.

 

Quick Read (beta)

loading the full paper ...