Abstract
This paper introduces FlowMap, an end-to-end differentiable method thatsolves for precise camera poses, camera intrinsics, and per-frame dense depthof a video sequence. Our method performs per-video gradient-descentminimization of a simple least-squares objective that compares the optical flowinduced by depth, intrinsics, and poses against correspondences obtained viaoff-the-shelf optical flow and point tracking. Alongside the use of pointtracks to encourage long-term geometric consistency, we introducedifferentiable re-parameterizations of depth, intrinsics, and pose that areamenable to first-order optimization. We empirically show that cameraparameters and dense depth recovered by our method enable photo-realistic novelview synthesis on 360-degree trajectories using Gaussian Splatting. Our methodnot only far outperforms prior gradient-descent based bundle adjustmentmethods, but surprisingly performs on par with COLMAP, the state-of-the-art SfMmethod, on the downstream task of 360-degree novel view synthesis (even thoughour method is purely gradient-descent based, fully differentiable, and presentsa complete departure from conventional SfM).