Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning

Abstract

Conventional video summarization approaches based on reinforcement learninghave the problem that the reward can only be received after the whole summaryis generated. Such kind of reward is sparse and it makes reinforcement learninghard to converge. Another problem is that labelling each frame is tedious andcostly, which usually prohibits the construction of large-scale datasets. Tosolve these problems, we propose a weakly supervised hierarchical reinforcementlearning framework, which decomposes the whole task into several subtasks toenhance the summarization quality. This framework consists of a manager networkand a worker network. For each subtask, the manager is trained to set a subgoalonly by a task-level binary label, which requires much fewer labels thanconventional approaches. With the guide of the subgoal, the worker predicts theimportance scores for video frames in the subtask by policy gradient accordingto both global reward and innovative defined sub-rewards to overcome the sparseproblem. Experiments on two benchmark datasets show that our proposal hasachieved the best performance, even better than supervised approaches.

Quick Read (beta)

loading the full paper ...