Learning 3D Object Shape and Layout without 3D Supervision

  • 2022-06-14 18:49:44
  • Georgia Gkioxari, Nikhila Ravi, Justin Johnson
  • 26

Abstract

A 3D scene consists of a set of objects, each with a shape and a layoutgiving their position in space. Understanding 3D scenes from 2D images is animportant goal, with applications in robotics and graphics. While there havebeen recent advances in predicting 3D shape and layout from a single image,most approaches rely on 3D ground truth for training which is expensive tocollect at scale. We overcome these limitations and propose a method thatlearns to predict 3D shape and layout for objects without any ground truthshape or layout information: instead we rely on multi-view images with 2Dsupervision which can more easily be collected at scale. Through extensiveexperiments on 3D Warehouse, Hypersim, and ScanNet we demonstrate that ourapproach scales to large datasets of realistic images, and compares favorablyto methods relying on 3D ground truth. On Hypersim and ScanNet where reliable3D ground truth is not available, our approach outperforms supervisedapproaches trained on smaller and less diverse datasets.

 

Quick Read (beta)

loading the full paper ...