Abstract
Recent advances in Neural Radiance Fields (NeRFs) have made it possible toreconstruct and reanimate dynamic portrait scenes with control over head-pose,facial expressions and viewing direction. However, training such models assumesphotometric consistency over the deformed region e.g. the face must be evenlylit as it deforms with changing head-pose and facial expression. Suchphotometric consistency across frames of a video is hard to maintain, even instudio environments, thus making the created reanimatable neural portraitsprone to artifacts during reanimation. In this work, we propose CoDyNeRF, asystem that enables the creation of fully controllable 3D portraits inreal-world capture conditions. CoDyNeRF learns to approximate illuminationdependent effects via a dynamic appearance model in the canonical space that isconditioned on predicted surface normals and the facial expressions andhead-pose deformations. The surface normals prediction is guided using 3DMMnormals that act as a coarse prior for the normals of the human head, wheredirect prediction of normals is hard due to rigid and non-rigid deformationsinduced by head-pose and facial expression changes. Using only asmartphone-captured short video of a subject for training, we demonstrate theeffectiveness of our method on free view synthesis of a portrait scene withexplicit head pose and expression controls, and realistic lighting effects. Theproject page can be found here:http://shahrukhathar.github.io/2023/08/22/CoDyNeRF.html