Abstract
A recent trend among generalizable novel view synthesis methods is to learn arendering operator acting over single camera rays. This approach is promisingbecause it removes the need for explicit volumetric rendering, but iteffectively treats target images as collections of independent pixels. Here, wepropose to learn a global rendering operator acting over all camera raysjointly. We show that the right representation to enable such rendering is the5-dimensional plane sweep volume, consisting of the projection of the inputimages on a set of planes facing the target camera. Based on thisunderstanding, we introduce our Convolutional Global Latent Renderer (ConvGLR),an efficient convolutional architecture that performs the rendering operationglobally in a low-resolution latent space. Experiments on various datasetsunder sparse and generalizable setups show that our approach consistentlyoutperforms existing methods by significant margins.