Abstract
3D Gaussian Splatting (3DGS) has demonstrated superior quality in modeling 3Dobjects and scenes. However, generating 3DGS remains challenging due to theirdiscrete, unstructured, and permutation-invariant nature. In this work, wepresent a simple yet effective method to overcome these challenges. We utilizespherical mapping to transform 3DGS into a structured 2D representation, termedUVGS. UVGS can be viewed as multi-channel images, with feature dimensions as aconcatenation of Gaussian attributes such as position, scale, color, opacity,and rotation. We further find that these heterogeneous features can becompressed into a lower-dimensional (e.g., 3-channel) shared feature spaceusing a carefully designed multi-branch network. The compressed UVGS can betreated as typical RGB images. Remarkably, we discover that typical VAEstrained with latent diffusion models can directly generalize to this newrepresentation without additional training. Our novel representation makes iteffortless to leverage foundational 2D models, such as diffusion models, todirectly model 3DGS. Additionally, one can simply increase the 2D UV resolutionto accommodate more Gaussians, making UVGS a scalable solution compared totypical 3D backbones. This approach immediately unlocks various novelgeneration applications of 3DGS by inherently utilizing the already developedsuperior 2D generation capabilities. In our experiments, we demonstrate variousunconditional, conditional generation, and inpainting applications of 3DGSbased on diffusion models, which were previously non-trivial.