Abstract
We present a novel framework named NeuralRecon for real-time 3D scenereconstruction from a monocular video. Unlike previous methods that estimatesingle-view depth maps separately on each key-frame and fuse them later, wepropose to directly reconstruct local surfaces represented as sparse TSDFvolumes for each video fragment sequentially by a neural network. Alearning-based TSDF fusion module based on gated recurrent units is used toguide the network to fuse features from previous fragments. This design allowsthe network to capture local smoothness prior and global shape prior of 3Dsurfaces when sequentially reconstructing the surfaces, resulting in accurate,coherent, and real-time surface reconstruction. The experiments on ScanNet and7-Scenes datasets show that our system outperforms state-of-the-art methods interms of both accuracy and speed. To the best of our knowledge, this is thefirst learning-based system that is able to reconstruct dense coherent 3Dgeometry in real-time.