InsTaG: Learning Personalized 3D Talking Head from Few-Second Video

Abstract

Despite exhibiting impressive performance in synthesizing lifelikepersonalized 3D talking heads, prevailing methods based on radiance fieldssuffer from high demands for training data and time for each new identity. Thispaper introduces InsTaG, a 3D talking head synthesis framework that allows afast learning of realistic personalized 3D talking head from few training data.Built upon a lightweight 3DGS person-specific synthesizer with universal motionpriors, InsTaG achieves high-quality and fast adaptation while preservinghigh-level personalization and efficiency. As preparation, we first propose anIdentity-Free Pre-training strategy that enables the pre-training of theperson-specific model and encourages the collection of universal motion priorsfrom long-video data corpus. To fully exploit the universal motion priors tolearn an unseen new identity, we then present a Motion-Aligned Adaptationstrategy to adaptively align the target head to the pre-trained field, andconstrain a robust dynamic head structure under few training data. Experimentsdemonstrate our outstanding performance and efficiency under various datascenarios to render high-quality personalized talking heads.

Quick Read (beta)

loading the full paper ...