Safety at Scale: A Comprehensive Survey of Large Model Safety

Abstract

The rapid advancement of large models, driven by their exceptional abilitiesin learning and generalization through large-scale pre-training, has reshapedthe landscape of Artificial Intelligence (AI). These models are nowfoundational to a wide range of applications, including conversational AI,recommendation systems, autonomous driving, content generation, medicaldiagnostics, and scientific discovery. However, their widespread deploymentalso exposes them to significant safety risks, raising concerns aboutrobustness, reliability, and ethical implications. This survey provides asystematic review of current safety research on large models, covering VisionFoundation Models (VFMs), Large Language Models (LLMs), Vision-LanguagePre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models(DMs), and large-model-based Agents. Our contributions are summarized asfollows: (1) We present a comprehensive taxonomy of safety threats to thesemodels, including adversarial attacks, data poisoning, backdoor attacks,jailbreak and prompt injection attacks, energy-latency attacks, data and modelextraction attacks, and emerging agent-specific threats. (2) We review defensestrategies proposed for each type of attacks if available and summarize thecommonly used datasets and benchmarks for safety research. (3) Building onthis, we identify and discuss the open challenges in large model safety,emphasizing the need for comprehensive safety evaluations, scalable andeffective defense mechanisms, and sustainable data practices. More importantly,we highlight the necessity of collective efforts from the research communityand international collaboration. Our work can serve as a useful reference forresearchers and practitioners, fostering the ongoing development ofcomprehensive defense systems and platforms to safeguard AI models.

Quick Read (beta)

loading the full paper ...