Abstract
In this work, we present a new multi-view depth estimation method thatutilizes both conventional SfM reconstruction and learning-based priors overthe recently proposed neural radiance fields (NeRF). Unlike existing neuralnetwork based optimization method that relies on estimated correspondences, ourmethod directly optimizes over implicit volumes, eliminating the challengingstep of matching pixels in indoor scenes. The key to our approach is to utilizethe learning-based priors to guide the optimization process of NeRF. Our systemfirstly adapts a monocular depth network over the target scene by finetuning onits sparse SfM reconstruction. Then, we show that the shape-radiance ambiguityof NeRF still exists in indoor environments and propose to address the issue byemploying the adapted depth priors to monitor the sampling process of volumerendering. Finally, a per-pixel confidence map acquired by error computation onthe rendered image can be used to further improve the depth quality.Experiments show that our proposed framework significantly outperformsstate-of-the-art methods on indoor scenes, with surprising findings presentedon the effectiveness of correspondence-based optimization and NeRF-basedoptimization over the adapted depth priors. In addition, we show that theguided optimization scheme does not sacrifice the original synthesis capabilityof neural radiance fields, improving the rendering quality on both seen andnovel views. Code is available at https://github.com/weiyithu/NerfingMVS.