We study how to synthesize novel views of human body from a single image.Though recent deep learning based methods work well for rigid objects, theyoften fail on objects with large articulation, like human bodies. The core stepof existing methods is to fit a map from the observable views to novel views byCNNs; however, the rich articulation modes of human body make it ratherchallenging for CNNs to memorize and interpolate the data well. To address theproblem, we propose a novel deep learning based pipeline that explicitlyestimates and leverages the geometry of the underlying human body. Our newpipeline is a composition of a shape estimation network and an image generationnetwork, and at the interface a perspective transformation is applied togenerate a forward flow for pixel value transportation. Our design is able tofactor out the space of data variation and makes learning at each step mucheasier. Empirically, we show that the performance for pose-varying objects canbe improved dramatically. Our method can also be applied on real data capturedby 3D sensors, and the flow generated by our methods can be used for generatinghigh quality results in higher resolution.