Endo-FASt3r: Endoscopic Foundation model Adaptation for Structure from motion

  • 2025-08-20 16:41:54
  • Mona Sheikh Zeinoddin, Mobarak I. Hoque, Zafer Tandogdu, Greg Shaw, Matthew J. Clarkson, Evangelos Mazomenos, Danail Stoyanov
  • 0

Abstract

Accurate depth and camera pose estimation is essential for achievinghigh-quality 3D visualisations in robotic-assisted surgery. Despite recentadvancements in foundation model adaptation to monocular depth estimation ofendoscopic scenes via self-supervised learning (SSL), no prior work hasexplored their use for pose estimation. These methods rely on low rank-basedadaptation approaches, which constrain model updates to a low-rank space. Wepropose Endo-FASt3r, the first monocular SSL depth and pose estimationframework that uses foundation models for both tasks. We extend the Reloc3rrelative pose estimation foundation model by designing Reloc3rX, introducingmodifications necessary for convergence in SSL. We also present DoMoRA, a noveladaptation technique that enables higher-rank updates and faster convergence.Experiments on the SCARED dataset show that Endo-FASt3r achieves a substantial$10\%$ improvement in pose estimation and a $2\%$ improvement in depthestimation over prior work. Similar performance gains on the Hamlyn andStereoMIS datasets reinforce the generalisability of Endo-FASt3r acrossdifferent datasets.

 

Quick Read (beta)

loading the full paper ...