JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Abstract

The rapid evolution of artificial intelligence (AI) through developments inLarge Language Models (LLMs) and Vision-Language Models (VLMs) has broughtsignificant advancements across various technological domains. While thesemodels enhance capabilities in natural language processing and visualinteractive tasks, their growing adoption raises critical concerns regardingsecurity and ethical alignment. This survey provides an extensive review of theemerging field of jailbreaking--deliberately circumventing the ethical andoperational boundaries of LLMs and VLMs--and the consequent development ofdefense mechanisms. Our study categorizes jailbreaks into seven distinct typesand elaborates on defense strategies that address these vulnerabilities.Through this comprehensive examination, we identify research gaps and proposedirections for future studies to enhance the security frameworks of LLMs andVLMs. Our findings underscore the necessity for a unified perspective thatintegrates both jailbreak strategies and defensive solutions to foster arobust, secure, and reliable environment for the next generation of languagemodels. More details can be found on our website:\url{https://chonghan-chen.com/llm-jailbreak-zoo-survey/}.

Quick Read (beta)

loading the full paper ...