The goal of image style transfer is to render an image with artistic featuresguided by a style reference while maintaining the original content. Due to thelocality and spatial invariance in CNNs, it is difficult to extract andmaintain the global information of input images. Therefore, traditional neuralstyle transfer methods are usually biased and content leak can be observed byrunning several times of the style transfer process with the same referencestyle image. To address this critical issue, we take long-range dependencies ofinput images into account for unbiased style transfer by proposing atransformer-based approach, namely StyTr^2. In contrast with visualtransformers for other vision tasks, our StyTr^2 contains two differenttransformer encoders to generate domain-specific sequences for content andstyle, respectively. Following the encoders, a multi-layer transformer decoderis adopted to stylize the content sequence according to the style sequence. Inaddition, we analyze the deficiency of existing positional encoding methods andpropose the content-aware positional encoding (CAPE) which is scale-invariantand more suitable for image style transfer task. Qualitative and quantitativeexperiments demonstrate the effectiveness of the proposed StyTr^2 compared tostate-of-the-art CNN-based and flow-based approaches.