Abstract
We demonstrate that discriminative models inherently contain powerfulgenerative capabilities, challenging the fundamental distinction betweendiscriminative and generative architectures. Our method, Direct AscentSynthesis (DAS), reveals these latent capabilities through multi-resolutionoptimization of CLIP model representations. While traditional inversionattempts produce adversarial patterns, DAS achieves high-quality imagesynthesis by decomposing optimization across multiple spatial scales (1x1 to224x224), requiring no additional training. This approach not only enablesdiverse applications -- from text-to-image generation to style transfer -- butmaintains natural image statistics ($1/f^2$ spectrum) and guides the generationaway from non-robust adversarial patterns. Our results demonstrate thatstandard discriminative models encode substantially richer generative knowledgethan previously recognized, providing new perspectives on modelinterpretability and the relationship between adversarial examples and naturalimage synthesis.