Recurrent Neural Network for Learning DenseDepth and Ego-Motion from Video

Abstract

Learning-based, single-view depth estimation often generalizes poorly tounseen datasets. While learning-based, two-frame depth estimation solves thisproblem to some extent by learning to match features across frames, it performspoorly at large depth where the uncertainty is high. There exists fewlearning-based, multi-view depth estimation methods. In this paper, we presenta learning-based, multi-view dense depth map and ego-motion estimation methodthat uses Recurrent Neural Networks (RNN). Our model is designed for 3Dreconstruction from video where the input frames are temporally correlated. Itis generalizable to single- or two-view dense depth estimation. Compared torecent single- or two-view CNN-based depth estimation methods, our modelleverages more views and achieves more accurate results, especially at largedistances. Our method produces superior results to the state-of-the-artlearning-based, single- or two-view depth estimation methods on both indoor andoutdoor benchmark datasets. We also demonstrate that our method can even workon extremely difficult sequences, such as endoscopic video, where none of theassumptions (static scene, constant lighting, Lambertian reflection, etc.) fromtraditional 3D reconstruction methods hold.

Quick Read (beta)

loading the full paper ...