Video object segmentation is challenging yet important in a wide variety ofapplications for video analysis. Recent works formulate video objectsegmentation as a prediction task using deep nets to achieve appealingstate-of-the-art performance. Due to the formulation as a prediction task, mostof these methods require fine-tuning during test time, such that the deep netsmemorize the appearance of the objects of interest in the given video. However,fine-tuning is time-consuming and computationally expensive, hence thealgorithms are far from real time. To address this issue, we develop a novelmatching based algorithm for video object segmentation. In contrast tomemorization based classification techniques, the proposed approach learns tomatch extracted features to a provided template without memorizing theappearance of the objects. We validate the effectiveness and the robustness ofthe proposed method on the challenging DAVIS-16, DAVIS-17, Youtube-Objects andJumpCut datasets. Extensive results show that our method achieves comparableperformance without fine-tuning and is much more favorable in terms ofcomputational time.