Neural View-Interpolation for Sparse Light Field Video

Abstract

We suggest representing light field (LF) videos as "one-off" neural networks(NN), i.e., a learned mapping from view-plus-time coordinates tohigh-resolution color values, trained on sparse views. Initially, this soundslike a bad idea for three main reasons: First, a NN LF will likely have lessquality than a same-sized pixel basis representation. Second, only few trainingdata, e.g., 9 exemplars per frame are available for sparse LF videos. Third,there is no generalization across LFs, but across view and time instead.Consequently, a network needs to be trained for each LF video. Surprisingly,these problems can turn into substantial advantages: Other than the linearpixel basis, a NN has to come up with a compact, non-linear i.e., moreintelligent, explanation of color, conditioned on the sparse view and timecoordinates. As observed for many NN however, this representation now isinterpolatable: if the image output for sparse view coordinates is plausible,it is for all intermediate, continuous coordinates as well. Our specificnetwork architecture involves a differentiable occlusion-aware warping step,which leads to a compact set of trainable parameters and consequently fastlearning and fast execution.

Quick Read (beta)

loading the full paper ...