Abstract
Procedural Content Generation (PCG) is powerful in creating high-quality 3Dcontents, yet controlling it to produce desired shapes is difficult and oftenrequires extensive parameter tuning. Inverse Procedural Content Generation aimsto automatically find the best parameters under the input condition. However,existing sampling-based and neural network-based methods still suffer fromnumerous sample iterations or limited controllability. In this work, we presentDI-PCG, a novel and efficient method for Inverse PCG from general imageconditions. At its core is a lightweight diffusion transformer model, where PCGparameters are directly treated as the denoising target and the observed imagesas conditions to control parameter generation. DI-PCG is efficient andeffective. With only 7.6M network parameters and 30 GPU hours to train, itdemonstrates superior performance in recovering parameters accurately, andgeneralizing well to in-the-wild images. Quantitative and qualitativeexperiment results validate the effectiveness of DI-PCG in inverse PCG andimage-to-3D generation tasks. DI-PCG offers a promising approach for efficientinverse PCG and represents a valuable exploration step towards a 3D generationpath that models how to construct a 3D asset using parametric models.