LIA-X: Interpretable Latent Portrait Animator

Abstract

We introduce LIA-X, a novel interpretable portrait animator designed totransfer facial dynamics from a driving video to a source portrait withfine-grained control. LIA-X is an autoencoder that models motion transfer as alinear navigation of motion codes in latent space. Crucially, it incorporates anovel Sparse Motion Dictionary that enables the model to disentangle facialdynamics into interpretable factors. Deviating from previous 'warp-render'approaches, the interpretability of the Sparse Motion Dictionary allows LIA-Xto support a highly controllable 'edit-warp-render' strategy, enabling precisemanipulation of fine-grained facial semantics in the source portrait. Thishelps to narrow initial differences with the driving video in terms of pose andexpression. Moreover, we demonstrate the scalability of LIA-X by successfullytraining a large-scale model with approximately 1 billion parameters onextensive datasets. Experimental results show that our proposed methodoutperforms previous approaches in both self-reenactment and cross-reenactmenttasks across several benchmarks. Additionally, the interpretable andcontrollable nature of LIA-X supports practical applications such asfine-grained, user-guided image and video editing, as well as 3D-aware portraitvideo manipulation.

Quick Read (beta)

loading the full paper ...