Coloring line art images based on the colors of reference images is animportant stage in animation production, which is time-consuming and tedious.In this paper, we propose a deep architecture to automatically color line artvideos with the same color style as the given reference images. Our frameworkconsists of a color transform network and a temporal constraint network. Thecolor transform network takes the target line art images as well as the lineart and color images of one or more reference images as input, and generatescorresponding target color images. To cope with larger differences between thetarget line art image and reference color images, our architecture utilizesnon-local similarity matching to determine the region correspondences betweenthe target image and the reference images, which are used to transform thelocal color information from the references to the target. To ensure globalcolor style consistency, we further incorporate Adaptive Instance Normalization(AdaIN) with the transformation parameters obtained from a style embeddingvector that describes the global color style of the references, extracted by anembedder. The temporal constraint network takes the reference images and thetarget image together in chronological order, and learns the spatiotemporalfeatures through 3D convolution to ensure the temporal consistency of thetarget image and the reference image. Our model can achieve even bettercoloring results by fine-tuning the parameters with only a small amount ofsamples when dealing with an animation of a new style. To evaluate our method,we build a line art coloring dataset. Experiments show that our method achievesthe best performance on line art video coloring compared to thestate-of-the-art methods and other baselines.