Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation

Abstract

We present Panoptic Neural Fields (PNF), an object-aware neural scenerepresentation that decomposes a scene into a set of objects (things) andbackground (stuff). Each object is represented by an oriented 3D bounding boxand a multi-layer perceptron (MLP) that takes position, direction, and time andoutputs density and radiance. The background stuff is represented by a similarMLP that additionally outputs semantic labels. Each object MLPs areinstance-specific and thus can be smaller and faster than previous object-awareapproaches, while still leveraging category-specific priors incorporated viameta-learned initialization. Our model builds a panoptic radiance fieldrepresentation of any scene from just color images. We use off-the-shelfalgorithms to predict camera poses, object tracks, and 2D image semanticsegmentations. Then we jointly optimize the MLP weights and bounding boxparameters using analysis-by-synthesis with self-supervision from color imagesand pseudo-supervision from predicted semantic segmentations. Duringexperiments with real-world dynamic scenes, we find that our model can be usedeffectively for several tasks like novel view synthesis, 2D panopticsegmentation, 3D scene editing, and multiview depth prediction.

Quick Read (beta)

loading the full paper ...