Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models

Abstract

Vision-language alignment in Large Vision-Language Models (LVLMs)successfully enables LLMs to understand visual input. However, we find thatexisting vision-language alignment methods fail to transfer the existing safetymechanism for text in LLMs to vision, which leads to vulnerabilities in toxicimage. To explore the cause of this problem, we give the insightful explanationof where and how the safety mechanism of LVLMs operates and conduct comparativeanalysis between text and vision. We find that the hidden states at thespecific transformer layers play a crucial role in the successful activation ofsafety mechanism, while the vision-language alignment at hidden states level incurrent methods is insufficient. This results in a semantic shift for inputimages compared to text in hidden states, therefore misleads the safetymechanism. To address this, we propose a novel Text-Guided vision-languageAlignment method (TGA) for LVLMs. TGA retrieves the texts related to inputvision and uses them to guide the projection of vision into the hidden statesspace in LLMs. Experiments show that TGA not only successfully transfers thesafety mechanism for text in basic LLMs to vision in vision-language alignmentfor LVLMs without any safety fine-tuning on the visual modality but alsomaintains the general performance on various vision tasks (Safe and Good).

Quick Read (beta)

loading the full paper ...