Durian: Dual Reference-guided Portrait Animation with Attribute Transfer

  • 2025-09-04 17:53:03
  • Hyunsoo Cha, Byungjun Kim, Hanbyul Joo
  • 0

Abstract

We present Durian, the first method for generating portrait animation videoswith facial attribute transfer from a given reference image to a targetportrait in a zero-shot manner. To enable high-fidelity and spatiallyconsistent attribute transfer across frames, we introduce dual referencenetworks that inject spatial features from both the portrait and attributeimages into the denoising process of a diffusion model. We train the modelusing a self-reconstruction formulation, where two frames are sampled from thesame portrait video: one is treated as the attribute reference and the other asthe target portrait, and the remaining frames are reconstructed conditioned onthese inputs and their corresponding masks. To support the transfer ofattributes with varying spatial extent, we propose a mask expansion strategyusing keypoint-conditioned image generation for training. In addition, wefurther augment the attribute and portrait images with spatial andappearance-level transformations to improve robustness to positionalmisalignment between them. These strategies allow the model to effectivelygeneralize across diverse attributes and in-the-wild reference combinations,despite being trained without explicit triplet supervision. Durian achievesstate-of-the-art performance on portrait animation with attribute transfer, andnotably, its dual reference design enables multi-attribute composition in asingle generation pass without additional training.

 

Quick Read (beta)

loading the full paper ...