Abstract
We present Panoptic Neural Fields (PNF), an object-aware neural scenerepresentation that decomposes a scene into a set of objects (things) andbackground (stuff). Each object is represented by an oriented 3D bounding boxand a multi-layer perceptron (MLP) that takes position, direction, and time andoutputs density and radiance. The background stuff is represented by a similarMLP that additionally outputs semantic labels. Each object MLPs areinstance-specific and thus can be smaller and faster than previous object-awareapproaches, while still leveraging category-specific priors incorporated viameta-learned initialization. Our model builds a panoptic radiance fieldrepresentation of any scene from just color images. We use off-the-shelfalgorithms to predict camera poses, object tracks, and 2D image semanticsegmentations. Then we jointly optimize the MLP weights and bounding boxparameters using analysis-by-synthesis with self-supervision from color imagesand pseudo-supervision from predicted semantic segmentations. Duringexperiments with real-world dynamic scenes, we find that our model can be usedeffectively for several tasks like novel view synthesis, 2D panopticsegmentation, 3D scene editing, and multiview depth prediction.