SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection

Abstract

Visual anomaly detection is critical in industrial manufacturing, buttraditional methods often rely on extensive normal datasets and custom models,limiting scalability. Recent advancements in large-scale visual-language modelshave significantly improved zero/few-shot anomaly detection. However, theseapproaches may not fully utilize hierarchical features, potentially missingnuanced details. We introduce a window self-attention mechanism based on theCLIP model, combined with learnable prompts to process multi-level featureswithin a Soldier-Offier Window self-Attention (SOWA) framework. Our method hasbeen tested on five benchmark datasets, demonstrating superior performance byleading in 18 out of 20 metrics compared to existing state-of-the-arttechniques.

Quick Read (beta)

loading the full paper ...