Abstract
With NeRF widely used for facial reenactment, recent methods can recoverphoto-realistic 3D head avatar from just a monocular video. Unfortunately, thetraining process of the NeRF-based methods is quite time-consuming, as MLP usedin the NeRF-based methods is inefficient and requires too many iterations toconverge. To overcome this problem, we propose ManVatar, a fast 3D head avatarreconstruction method using Motion-Aware Neural Voxels. ManVatar is the firstto decouple expression motion from canonical appearance for head avatar, andmodel the expression motion by neural voxels. In particular, the motion-awareneural voxels is generated from the weighted concatenation of multiple 4Dtensors. The 4D tensors semantically correspond one-to-one with 3DMM expressionbases and share the same weights as 3DMM expression coefficients. Benefitingfrom our novel representation, the proposed ManVatar can recoverphoto-realistic head avatars in just 5 minutes (implemented with pure PyTorch),which is significantly faster than the state-of-the-art facial reenactmentmethods.