We propose a learning based method for generating new animations of a cartooncharacter given a few example images. Our method is designed to learn from atraditionally animated sequence, where each frame is drawn by an artist, andthus the input images lack any common structure, correspondences, or labels. Weexpress pose changes as a deformation of a layered 2.5D template mesh, anddevise a novel architecture that learns to predict mesh deformations matchingthe template to a target image. This enables us to extract a commonlow-dimensional structure from a diverse set of character poses. We combinerecent advances in differentiable rendering as well as mesh-aware models tosuccessfully align common template even if only a few character images areavailable during training. In addition to coarse poses, character appearancealso varies due to shading, out-of-plane motions, and artistic effects. Wecapture these subtle changes by applying an image translation network to refinethe mesh rendering, providing an end-to-end model to generate new animations ofa character with high visual quality. We demonstrate that our generative modelcan be used to synthesize in-between frames and to create data-drivendeformation. Our template fitting procedure outperforms state-of-the-artgeneric techniques for detecting image correspondences.