IBRNet: Learning Multi-View Image-Based Rendering

Abstract

We present a method that synthesizes novel views of complex scenes byinterpolating a sparse set of nearby views. The core of our method is a networkarchitecture that includes a multilayer perceptron and a ray transformer thatestimates radiance and volume density at continuous 5D locations (3D spatiallocations and 2D viewing directions), drawing appearance information on the flyfrom multiple source views. By drawing on source views at render time, ourmethod hearkens back to classic work on image-based rendering (IBR), and allowsus to render high-resolution imagery. Unlike neural scene representation workthat optimizes per-scene functions for rendering, we learn a generic viewinterpolation function that generalizes to novel scenes. We render images usingclassic volume rendering, which is fully differentiable and allows us to trainusing only multi-view posed images as supervision. Experiments show that ourmethod outperforms recent novel view synthesis methods that also seek togeneralize to novel scenes. Further, if fine-tuned on each scene, our method iscompetitive with state-of-the-art single-scene neural rendering methods.

Quick Read (beta)

loading the full paper ...