Abstract
The creation of high-fidelity, digital versions of human heads is animportant stepping stone in the process of further integrating virtualcomponents into our everyday lives. Constructing such avatars is a challengingresearch problem, due to a high demand for photo-realism and real-timerendering performance. In this work, we propose Neural Parametric GaussianAvatars (NPGA), a data-driven approach to create high-fidelity, controllableavatars from multi-view video recordings. We build our method around 3DGaussian splatting for its highly efficient rendering and to inherit thetopological flexibility of point clouds. In contrast to previous work, wecondition our avatars' dynamics on the rich expression space of neuralparametric head models (NPHM), instead of mesh-based 3DMMs. To this end, wedistill the backward deformation field of our underlying NPHM into forwarddeformations which are compatible with rasterization-based rendering. Allremaining fine-scale, expression-dependent details are learned from themulti-view videos. For increased representational capacity of our avatars, wepropose per-Gaussian latent features that condition each primitives dynamicbehavior. To regularize this increased dynamic expressivity, we proposeLaplacian terms on the latent features and predicted dynamics. We evaluate ourmethod on the public NeRSemble dataset, demonstrating that NPGA significantlyoutperforms the previous state-of-the-art avatars on the self-reenactment taskby 2.6 PSNR. Furthermore, we demonstrate accurate animation capabilities fromreal-world monocular videos.