Moving Object Segmentation: All You Need Is SAM (and Flow)

Abstract

The objective of this paper is motion segmentation -- discovering andsegmenting the moving objects in a video. This is a much studied area withnumerous careful,and sometimes complex, approaches and training schemesincluding: self-supervised learning, learning from synthetic datasets,object-centric representations, amodal representations, and many more. Ourinterest in this paper is to determine if the Segment Anything model (SAM) cancontribute to this task. We investigate two models for combining SAM withoptical flow that harness the segmentation power of SAM with the ability offlow to discover and group moving objects. In the first model, we adapt SAM totake optical flow, rather than RGB, as an input. In the second, SAM takes RGBas an input, and flow is used as a segmentation prompt. These surprisinglysimple methods, without any further modifications, outperform all previousapproaches by a considerable margin in both single and multi-object benchmarks.We also extend these frame-level segmentations to sequence-level segmentationsthat maintain object identity. Again, this simple model outperforms previousmethods on multiple video object segmentation benchmarks.

Quick Read (beta)

loading the full paper ...