Abstract
Vision-based depth reconstruction is a challenging problem extensivelystudied in computer vision but still lacking universal solution. Reconstructingdepth from single image is particularly valuable to mobile robotics as it canbe embedded to the modern vision-based simultaneous localization and mapping(vSLAM) methods providing them with the metric information needed to constructaccurate maps in real scale. Typically, depth reconstruction is done nowadaysvia fully-convolutional neural networks (FCNNs). In this work we experimentwith several FCNN architectures and introduce a few enhancements aimed atincreasing both the effectiveness and the efficiency of the inference. Weexperimentally determine the solution that provides the bestperformance/accuracy tradeoff and is able to run on NVidia Jetson with theframerates exceeding 16FPS for 320 x 240 input. We also evaluate the suggestedmodels by conducting monocular vSLAM of unknown indoor environment on NVidiaJetson TX2 in real-time. Open-source implementation of the models and theinference node for Robot Operating System (ROS) are available athttps://github.com/CnnDepth/tx2_fcnn_node.