Abstract
Deep-learning-based video processing has yielded transformative results inrecent years. However, the video analytics pipeline is energy-intensive due tohigh data rates and reliance on complex inference algorithms, which limits itsadoption in energy-constrained applications. Motivated by the observation ofhigh and variable spatial redundancy and temporal dynamics in video datastreams, we design and evaluate an adaptive-resolution optimization frameworkto minimize the energy use of multi-task video analytics pipelines. Instead ofheuristically tuning the input data resolution of individual tasks, ourframework utilizes deep reinforcement learning to dynamically govern the inputresolution and computation of the entire video analytics pipeline. Bymonitoring the impact of varying resolution on the quality of high-dimensionalvideo analytics features, hence the accuracy of video analytics results, theproposed end-to-end optimization framework learns the best non-myopic policyfor dynamically controlling the resolution of input video streams to globallyoptimize energy efficiency. Governed by reinforcement learning, optical flow isincorporated into the framework to minimize unnecessary spatio-temporalredundancy that leads to re-computation, while preserving accuracy. Theproposed framework is applied to video instance segmentation which is one ofthe most challenging computer vision tasks, and achieves better energyefficiency than all baseline methods of similar accuracy on the YouTube-VISdataset.