Abstract
For many applications with limited computation, communication, storage andenergy resources, there is an imperative need of computer vision methods thatcould select an informative subset of the input video for efficient processingat or near real time. In the literature, there are two relevant groups ofapproaches: generating a trailer for a video or fast-forwarding whilewatching/processing the video. The first group is supported by videosummarization techniques, which require processing of the entire video toselect an important subset for showing to users. In the second group, currentfast-forwarding methods depend on either manual control or automatic adaptationof playback speed, which often do not present an accurate representation andmay still require processing of every frame. In this paper, we introduceFastForwardNet (FFNet), a reinforcement learning agent that gets inspirationfrom video summarization and does fast-forwarding differently. It is an onlineframework that automatically fast-forwards a video and presents arepresentative subset of frames to users on the fly. It does not requireprocessing the entire video, but just the portion that is selected by thefast-forward agent, which makes the process very computationally efficient. Theonline nature of our proposed method also enables the users to beginfast-forwarding at any point of the video. Experiments on two real-worlddatasets demonstrate that our method can provide better representation of theinput video with much less processing requirement.