Multiview Compressive Coding for 3D Reconstruction

Abstract

A central goal of visual recognition is to understand objects and scenes froma single image. 2D recognition has witnessed tremendous progress thanks tolarge-scale learning and general-purpose representations. Comparatively, 3Dposes new challenges stemming from occlusions not depicted in the image. Priorworks try to overcome these by inferring from multiple views or rely on scarceCAD models and category-specific priors which hinder scaling to novel settings.In this work, we explore single-view 3D reconstruction by learninggeneralizable representations inspired by advances in self-supervised learning.We introduce a simple framework that operates on 3D points of single objects orwhole scenes coupled with category-agnostic large-scale training from diverseRGB-D videos. Our model, Multiview Compressive Coding (MCC), learns to compressthe input appearance and geometry to predict the 3D structure by querying a3D-aware decoder. MCC's generality and efficiency allow it to learn fromlarge-scale and diverse data sources with strong generalization to novelobjects imagined by DALL$\cdot$E 2 or captured in-the-wild with an iPhone.

Quick Read (beta)

loading the full paper ...