Deep-learning-based video processing has yielded transformative results inrecent years. However, the video analytics pipeline is energy-intensive due tohigh data rates and reliance on complex inference algorithms, which limits itsadoption in energy-constrained applications. Motivated by the observation ofhigh and variable spatial redundancy and temporal dynamics in video datastreams, we design and evaluate an adaptive-resolution optimization frameworkto minimize the energy use of multi-task video analytics pipelines. Instead ofheuristically tuning the input data resolution of individual tasks, ourframework utilizes deep reinforcement learning to dynamically govern the inputresolution and computation of the entire video analytics pipeline. Bymonitoring the impact of varying resolution on the quality of high-dimensionalvideo analytics features, hence the accuracy of video analytics results, theproposed end-to-end optimization framework learns the best non-myopic policyfor dynamically controlling the resolution of input video streams to achieveglobally optimize energy efficiency. Governed by reinforcement learning,optical flow is incorporated into the framework to minimize unnecessaryspatio-temporal redundancy that leads to re-computation, while preservingaccuracy. The proposed framework is applied to video instance segmentationwhich is one of the most challenging machine vision tasks, and the energyconsumption efficiency of the proposed framework has significantly surpassedall baseline methods of similar accuracy on the YouTube-VIS dataset.