WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

  • 2025-09-05 17:59:47
  • Zizun Li, Jianjun Zhou, Yifan Wang, Haoyu Guo, Wenzheng Chang, Yang Zhou, Haoyi Zhu, Junyi Chen, Chunhua Shen, Tong He
  • 0

Abstract

We present WinT3R, a feed-forward reconstruction model capable of onlineprediction of precise camera poses and high-quality point maps. Previousmethods suffer from a trade-off between reconstruction quality and real-timeperformance. To address this, we first introduce a sliding window mechanismthat ensures sufficient information exchange among frames within the window,thereby improving the quality of geometric predictions without largecomputation. In addition, we leverage a compact representation of cameras andmaintain a global camera token pool, which enhances the reliability of camerapose estimation without sacrificing efficiency. These designs enable WinT3R toachieve state-of-the-art performance in terms of online reconstruction quality,camera pose estimation, and reconstruction speed, as validated by extensiveexperiments on diverse datasets. Code and model are publicly available athttps://github.com/LiZizun/WinT3R.

 

Quick Read (beta)

loading the full paper ...