Image-based virtual try-on is challenging in fitting a target in-shop clothesinto a reference person under diverse human poses. Previous works focus onpreserving clothing details ( e.g., texture, logos, patterns ) whentransferring desired clothes onto a target person under a fixed pose. However,the performances of existing methods significantly dropped when extendingexisting methods to multi-pose virtual try-on. In this paper, we propose anend-to-end Semantic Prediction Guidance multi-pose Virtual Try-On Network(SPG-VTON), which could fit the desired clothing into a reference person underarbitrary poses. Concretely, SPG-VTON is composed of three sub-modules. First,a Semantic Prediction Module (SPM) generates the desired semantic map. Thepredicted semantic map provides more abundant guidance to locate the desiredclothes region and produce a coarse try-on image. Second, a Clothes WarpingModule (CWM) warps in-shop clothes to the desired shape according to thepredicted semantic map and the desired pose. Specifically, we introduce aconductible cycle consistency loss to alleviate the misalignment in the clotheswarping process. Third, a Try-on Synthesis Module (TSM) combines the coarseresult and the warped clothes to generate the final virtual try-on image,preserving details of the desired clothes and under the desired pose. Besides,we introduce a face identity loss to refine the facial appearance and maintainthe identity of the final virtual try-on result at the same time. We evaluatethe proposed method on the most massive multi-pose dataset (MPV) and theDeepFashion dataset. The qualitative and quantitative experiments show thatSPG-VTON is superior to the state-of-the-art methods and is robust to the datanoise, including background and accessory changes, i.e., hats and handbags,showing good scalability to the real-world scenario.