Abstract
In this paper, we propose LSRNA, a novel framework for higher-resolution(exceeding 1K) image generation using diffusion models by leveragingsuper-resolution directly in the latent space. Existing diffusion modelsstruggle with scaling beyond their training resolutions, often leading tostructural distortions or content repetition. Reference-based methods addressthe issues by upsampling a low-resolution reference to guide higher-resolutiongeneration. However, they face significant challenges: upsampling in latentspace often causes manifold deviation, which degrades output quality. On theother hand, upsampling in RGB space tends to produce overly smoothed outputs.To overcome these limitations, LSRNA combines Latent space Super-Resolution(LSR) for manifold alignment and Region-wise Noise Addition (RNA) to enhancehigh-frequency details. Our extensive experiments demonstrate that integratingLSRNA outperforms state-of-the-art reference-based methods across variousresolutions and metrics, while showing the critical role of latent spaceupsampling in preserving detail and sharpness. The code is available athttps://github.com/3587jjh/LSRNA.