Sim2real transfer learning for 3D pose estimation: motion to the rescue

Abstract

Simulation is an anonymous, low-bias source of data where annotation canoften be done automatically; however, for some tasks, current models trained onsynthetic data generalize poorly to real data. The task of 3D human poseestimation is a particularly interesting example of this sim2real problem,because learning-based approaches perform reasonably well given real trainingdata, yet labeled 3D poses are extremely difficult to obtain in the wild,limiting scalability. In this paper, we show that standard neural-networkapproaches, which perform poorly when trained on synthetic RGB images, canperform well when the data is pre-processed to extract cues about the person'smotion, notably as optical flow and the motion of 2D keypoints. Therefore, ourresults suggest that motion can be a simple way to bridge a sim2real gap whenvideo is available. We evaluate on the 3D Poses in the Wild dataset, the mostchallenging modern standard of 3D pose estimation, where we show full 3D meshrecovery that is on par with state-of-the-art methods trained on real 3Dsequences, despite training only on synthetic humans from the SURREAL dataset.

Quick Read (beta)

loading the full paper ...