Abstract
In this paper we propose a convolutional neural network that is designed toupsample a series of sparse range measurements based on the contextual cuesgleaned from a high resolution intensity image. Our approach draws inspirationfrom related work on super-resolution and in-painting. We propose a novelarchitecture that seeks to pull contextual cues separately from the intensityimage and the depth features and then fuse them later in the network. We arguethat this approach effectively exploits the relationship between the twomodalities and produces accurate results while respecting salient imagestructures. We present experimental results to demonstrate that our approach iscomparable with state of the art methods and generalizes well across multipledatasets.