Abstract
Stylized abstraction synthesizes visually exaggerated yet semanticallyfaithful representations of subjects, balancing recognizability with perceptualdistortion. Unlike image-to-image translation, which prioritizes structuralfidelity, stylized abstraction demands selective retention of identity cueswhile embracing stylistic divergence, especially challenging forout-of-distribution individuals. We propose a training-free framework thatgenerates stylized abstractions from a single image using inference-timescaling in vision-language models (VLLMs) to extract identity-relevantfeatures, and a novel cross-domain rectified flow inversion strategy thatreconstructs structure based on style-dependent priors. Our method adaptsstructural restoration dynamically through style-aware temporal scheduling,enabling high-fidelity reconstructions that honor both subject and style. Itsupports multi-round abstraction-aware generation without fine-tuning. Toevaluate this task, we introduce StyleBench, a GPT-based human-aligned metricsuited for abstract styles where pixel-level similarity fails. Experimentsacross diverse abstraction (e.g., LEGO, knitted dolls, South Park) show stronggeneralization to unseen identities and styles in a fully open-source setup.