Investigation of Frame Differences as Motion Cues for Video Object Segmentation

Abstract

Automatic Video Object Segmentation (AVOS) refers to the task of autonomouslysegmenting target objects in video sequences without relying on human-providedannotations in the first frames. In AVOS, the use of motion information iscrucial, with optical flow being a commonly employed method for capturingmotion cues. However, the computation of optical flow is resource-intensive,making it unsuitable for real-time applications, especially on edge deviceswith limited computational resources. In this study, we propose using framedifferences as an alternative to optical flow for motion cue extraction. Wedeveloped an extended U-Net-like AVOS model that takes a frame on whichsegmentation is performed and a frame difference as inputs, and outputs anestimated segmentation map. Our experimental results demonstrate that theproposed model achieves performance comparable to the model with optical flowas an input, particularly when applied to videos captured by stationarycameras. Our results suggest the usefulness of employing frame differences asmotion cues in cases with limited computational resources.

Quick Read (beta)

loading the full paper ...