LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS

Abstract

Video Object Segmentation (VOS) presents several challenges, including objectocclusion and fragmentation, the dis-appearance and re-appearance of objects,and tracking specific objects within crowded scenes. In this work, we combinethe strengths of the state-of-the-art (SOTA) models SAM2 and Cutie to addressthese challenges. Additionally, we explore the impact of varioushyperparameters on video instance segmentation performance. Our approachachieves a J\&F score of 0.7952 in the testing phase of LSVOS challenge VOStrack, ranking third overall.

Quick Read (beta)

loading the full paper ...