Abstract
The ability to animate photo-realistic head avatars reconstructed frommonocular portrait video sequences represents a crucial step in bridging thegap between the virtual and real worlds. Recent advancements in head avatartechniques, including explicit 3D morphable meshes (3DMM), point clouds, andneural implicit representation have been exploited for this ongoing research.However, 3DMM-based methods are constrained by their fixed topologies,point-based approaches suffer from a heavy training burden due to the extensivequantity of points involved, and the last ones suffer from limitations indeformation flexibility and rendering efficiency. In response to thesechallenges, we propose MonoGaussianAvatar (Monocular Gaussian Point-based HeadAvatar), a novel approach that harnesses 3D Gaussian point representationcoupled with a Gaussian deformation field to learn explicit head avatars frommonocular portrait videos. We define our head avatars with Gaussian pointscharacterized by adaptable shapes, enabling flexible topology. These pointsexhibit movement with a Gaussian deformation field in alignment with the targetpose and expression of a person, facilitating efficient deformation.Additionally, the Gaussian points have controllable shape, size, color, andopacity combined with Gaussian splatting, allowing for efficient training andrendering. Experiments demonstrate the superior performance of our method,which achieves state-of-the-art results among previous methods.