De-rendering 3D Objects in the Wild

Abstract

With increasing focus on augmented and virtual reality applications (XR)comes the demand for algorithms that can lift objects from images and videosinto representations that are suitable for a wide variety of related 3D tasks.Large-scale deployment of XR devices and applications means that we cannotsolely rely on supervised learning, as collecting and annotating data for theunlimited variety of objects in the real world is infeasible. We present aweakly supervised method that is able to decompose a single image of an objectinto shape (depth and normals), material (albedo, reflectivity and shininess)and global lighting parameters. For training, the method only relies on a roughinitial shape estimate of the training objects to bootstrap the learningprocess. This shape supervision can come for example from a pretrained depthnetwork or - more generically - from a traditional structure-from-motionpipeline. In our experiments, we show that the method can successfullyde-render 2D images into a decomposed 3D representation and generalizes tounseen object categories. Since in-the-wild evaluation is difficult due to thelack of ground truth data, we also introduce a photo-realistic synthetic testset that allows for quantitative evaluation.

Quick Read (beta)

loading the full paper ...