Depth sensing is crucial for 3D reconstruction and scene understanding.Active depth sensors provide dense metric measurements, but often suffer fromlimitations such as restricted operating ranges, low spatial resolution, sensorinterference, and high power consumption. In this paper, we propose a deeplearning (DL) method to estimate per-pixel depth and its uncertaintycontinuously from a monocular video stream, with the goal of effectivelyturning an RGB camera into an RGB-D camera. Unlike prior DL-based methods, weestimate a depth probability distribution for each pixel rather than a singledepth value, leading to an estimate of a 3D depth probability volume for eachinput frame. These depth probability volumes are accumulated over time under aBayesian filtering framework as more incoming frames are processedsequentially, which effectively reduces depth uncertainty and improvesaccuracy, robustness, and temporal stability. Compared to prior work, theproposed approach achieves more accurate and stable results, and generalizesbetter to new datasets. Experimental results also show the output of ourapproach can be directly fed into classical RGB-D based 3D scanning methods for3D scene reconstruction.