LIFe-GoM: Generalizable Human Rendering with Learned Iterative Feedback Over Multi-Resolution Gaussians-on-Mesh

Abstract

Generalizable rendering of an animatable human avatar from sparse inputsrelies on data priors and inductive biases extracted from training on largedata to avoid scene-specific optimization and to enable fast reconstruction.This raises two main challenges: First, unlike iterative gradient-basedadjustment in scene-specific optimization, generalizable methods mustreconstruct the human shape representation in a single pass at inference time.Second, rendering is preferably computationally efficient yet of highresolution. To address both challenges we augment the recently proposed dualshape representation, which combines the benefits of a mesh and Gaussianpoints, in two ways. To improve reconstruction, we propose an iterativefeedback update framework, which successively improves the canonical humanshape representation during reconstruction. To achieve computationallyefficient yet high-resolution rendering, we study a coupled-multi-resolutionGaussians-on-Mesh representation. We evaluate the proposed approach on thechallenging THuman2.0, XHuman and AIST++ data. Our approach reconstructs ananimatable representation from sparse inputs in less than 1s, renders viewswith 95.1FPS at $1024 \times 1024$, and achieves PSNR/LPIPS*/FID of24.65/110.82/51.27 on THuman2.0, outperforming the state-of-the-art inrendering quality.

Quick Read (beta)

loading the full paper ...