Abstract
We present IntrinsicAvatar, a novel approach to recovering the intrinsicproperties of clothed human avatars including geometry, albedo, material, andenvironment lighting from only monocular videos. Recent advancements inhuman-based neural rendering have enabled high-quality geometry and appearancereconstruction of clothed humans from just monocular videos. However, thesemethods bake intrinsic properties such as albedo, material, and environmentlighting into a single entangled neural representation. On the other hand, onlya handful of works tackle the problem of estimating geometry and disentangledappearance properties of clothed humans from monocular videos. They usuallyachieve limited quality and disentanglement due to approximations of secondaryshading effects via learned MLPs. In this work, we propose to model secondaryshading effects explicitly via Monte-Carlo ray tracing. We model the renderingprocess of clothed humans as a volumetric scattering process, and combine raytracing with body articulation. Our approach can recover high-quality geometry,albedo, material, and lighting properties of clothed humans from a singlemonocular video, without requiring supervised pre-training using ground truthmaterials. Furthermore, since we explicitly model the volumetric scatteringprocess and ray tracing, our model naturally generalizes to novel poses,enabling animation of the reconstructed avatar in novel lighting conditions.