Abstract
We present WinT3R, a feed-forward reconstruction model capable of onlineprediction of precise camera poses and high-quality point maps. Previousmethods suffer from a trade-off between reconstruction quality and real-timeperformance. To address this, we first introduce a sliding window mechanismthat ensures sufficient information exchange among frames within the window,thereby improving the quality of geometric predictions without largecomputation. In addition, we leverage a compact representation of cameras andmaintain a global camera token pool, which enhances the reliability of camerapose estimation without sacrificing efficiency. These designs enable WinT3R toachieve state-of-the-art performance in terms of online reconstruction quality,camera pose estimation, and reconstruction speed, as validated by extensiveexperiments on diverse datasets. Code and model are publicly available athttps://github.com/LiZizun/WinT3R.