Abstract
Zero-shot domain adaptation is a method for adapting a model to a targetdomain without utilizing target domain image data. To enable adaptation withouttarget images, existing studies utilize CLIP's embedding space and textdescription to simulate target-like style features. Despite the previousachievements in zero-shot domain adaptation, we observe that these text-drivenmethods struggle to capture complex real-world variations and significantlyincrease adaptation time due to their alignment process. Instead of relying ontext descriptions, we explore solutions leveraging image data, which providesdiverse and more fine-grained style cues. In this work, we propose SIDA, anovel and efficient zero-shot domain adaptation method leveraging syntheticimages. To generate synthetic images, we first create detailed, source-likeimages and apply image translation to reflect the style of the target domain.We then utilize the style features of these synthetic images as a proxy for thetarget domain. Based on these features, we introduce Domain Mix and Patch StyleTransfer modules, which enable effective modeling of real-world variations. Inparticular, Domain Mix blends multiple styles to expand the intra-domainrepresentations, and Patch Style Transfer assigns different styles toindividual patches. We demonstrate the effectiveness of our method by showingstate-of-the-art performance in diverse zero-shot adaptation scenarios,particularly in challenging domains. Moreover, our approach achieves highefficiency by significantly reducing the overall adaptation time.