Text2Light: Zero-Shot Text-Driven HDR Panorama Generation

Abstract

High-quality HDRIs(High Dynamic Range Images), typically HDR panoramas, areone of the most popular ways to create photorealistic lighting and 360-degreereflections of 3D scenes in graphics. Given the difficulty of capturing HDRIs,a versatile and controllable generative model is highly desired, where laymanusers can intuitively control the generation process. However, existingstate-of-the-art methods still struggle to synthesize high-quality panoramasfor complex scenes. In this work, we propose a zero-shot text-driven framework,Text2Light, to generate 4K+ resolution HDRIs without paired training data.Given a free-form text as the description of the scene, we synthesize thecorresponding HDRI with two dedicated steps: 1) text-driven panorama generationin low dynamic range(LDR) and low resolution, and 2) super-resolution inversetone mapping to scale up the LDR panorama both in resolution and dynamic range.Specifically, to achieve zero-shot text-driven panorama generation, we firstbuild dual codebooks as the discrete representation for diverse environmentaltextures. Then, driven by the pre-trained CLIP model, a text-conditioned globalsampler learns to sample holistic semantics from the global codebook accordingto the input text. Furthermore, a structure-aware local sampler learns tosynthesize LDR panoramas patch-by-patch, guided by holistic semantics. Toachieve super-resolution inverse tone mapping, we derive a continuousrepresentation of 360-degree imaging from the LDR panorama as a set ofstructured latent codes anchored to the sphere. This continuous representationenables a versatile module to upscale the resolution and dynamic rangesimultaneously. Extensive experiments demonstrate the superior capability ofText2Light in generating high-quality HDR panoramas. In addition, we show thefeasibility of our work in realistic rendering and immersive VR.

Quick Read (beta)

loading the full paper ...