Self-Supervised Learning for Semi-Supervised Temporal Action Proposal

Abstract

Self-supervised learning presents a remarkable performance to utilizeunlabeled data for various video tasks. In this paper, we focus on applying thepower of self-supervised methods to improve semi-supervised action proposalgeneration. Particularly, we design an effective Self-supervisedSemi-supervised Temporal Action Proposal (SSTAP) framework. The SSTAP containstwo crucial branches, i.e., temporal-aware semi-supervised branch andrelation-aware self-supervised branch. The semi-supervised branch improves theproposal model by introducing two temporal perturbations, i.e., temporalfeature shift and temporal feature flip, in the mean teacher framework. Theself-supervised branch defines two pretext tasks, including masked featurereconstruction and clip-order prediction, to learn the relation of temporalclues. By this means, SSTAP can better explore unlabeled videos, and improvethe discriminative abilities of learned action features. We extensivelyevaluate the proposed SSTAP on THUMOS14 and ActivityNet v1.3 datasets. Theexperimental results demonstrate that SSTAP significantly outperformsstate-of-the-art semi-supervised methods and even matches fully-supervisedmethods. Code is available at https://github.com/wangxiang1230/SSTAP.

Quick Read (beta)

loading the full paper ...