MetaPix: Few-Shot Video Retargeting

Abstract

We address the task of unsupervised retargeting of human actions from onevideo to another. We consider the challenging setting where only a few framesof the target is available. The core of our approach is a conditionalgenerative model that can transcode input skeletal poses (automaticallyextracted with an off-the-shelf pose estimator) to output target frames.However, it is challenging to build a universal transcoder because humans canappear wildly different due to clothing and background scene geometry. Instead,we learn to adapt - or personalize - a universal generator to the particularhuman and background in the target. To do so, we make use of meta-learning todiscover effective strategies for on-the-fly personalization. One significantbenefit of meta-learning is that the personalized transcoder naturally enforcestemporal coherence across its generated frames; all frames contain consistentclothing and background geometry of the target. We experiment on in-the-wildinternet videos and images and show our approach improves over widely-usedbaselines for the task.

Quick Read (beta)

loading the full paper ...