Abstract
Understanding and modeling lighting effects are fundamental tasks in computervision and graphics. Classic physically-based rendering (PBR) accuratelysimulates the light transport, but relies on precise scenerepresentations--explicit 3D geometry, high-quality material properties, andlighting conditions--that are often impractical to obtain in real-worldscenarios. Therefore, we introduce DiffusionRenderer, a neural approach thataddresses the dual problem of inverse and forward rendering within a holisticframework. Leveraging powerful video diffusion model priors, the inverserendering model accurately estimates G-buffers from real-world videos,providing an interface for image editing tasks, and training data for therendering model. Conversely, our rendering model generates photorealisticimages from G-buffers without explicit light transport simulation. Experimentsdemonstrate that DiffusionRenderer effectively approximates inverse andforwards rendering, consistently outperforming the state-of-the-art. Our modelenables practical applications from a single video input--including relighting,material editing, and realistic object insertion.