Swapping text in scene images while preserving original fonts, colors, sizesand background textures is a challenging task due to the complex interplaybetween different factors. In this work, we present SwapText, a three-stageframework to transfer texts across scene images. First, a novel text swappingnetwork is proposed to replace text labels only in the foreground image.Second, a background completion network is learned to reconstruct backgroundimages. Finally, the generated foreground image and background image are usedto generate the word image by the fusion network. Using the proposingframework, we can manipulate the texts of the input images even with severegeometric distortion. Qualitative and quantitative results are presented onseveral scene text datasets, including regular and irregular text datasets. Weconducted extensive experiments to prove the usefulness of our method such asimage based text translation, text image synthesis, etc.