Abstract
This paper presents a neural network to estimate a detailed depth map of theforeground human in a single RGB image. The result captures geometry detailssuch as cloth wrinkles, which are important in visualization applications. Toachieve this goal, we separate the depth map into a smooth base shape and aresidual detail shape and design a network with two branches to regress themrespectively. We design a training strategy to ensure both base and detailshapes can be faithfully learned by the corresponding network branches.Furthermore, we introduce a novel network layer to fuse a rough depth map andsurface normals to further improve the final result. Quantitative comparisonwith fused `ground truth' captured by real depth cameras and qualitativeexamples on unconstrained Internet images demonstrate the strength of theproposed method.