PanoVOS:Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation

Abstract

Panoramic videos contain richer spatial information and have attractedtremendous amounts of attention due to their exceptional experience in somefields such as autonomous driving and virtual reality. However, existingdatasets for video segmentation only focus on conventional planar images. Toaddress the challenge, in this paper, we present a panoramic video dataset,PanoVOS. The dataset provides 150 videos with high video resolutions anddiverse motions. To quantify the domain gap between 2D planar videos andpanoramic videos, we evaluate 15 off-the-shelf video object segmentation (VOS)models on PanoVOS. Through error analysis, we found that all of them fail totackle pixel-level content discontinues of panoramic videos. Thus, we present aPanoramic Space Consistency Transformer (PSCFormer), which can effectivelyutilize the semantic boundary information of the previous frame for pixel-levelmatching with the current frame. Extensive experiments demonstrate thatcompared with the previous SOTA models, our PSCFormer network exhibits a greatadvantage in terms of segmentation results under the panoramic setting. Ourdataset poses new challenges in panoramic VOS and we hope that our PanoVOS canadvance the development of panoramic segmentation/tracking.

Quick Read (beta)

loading the full paper ...