We propose a novel approach for unsupervised 3D animation of non-rigiddeformable objects. Our method learns the 3D structure and dynamics of objectssolely from single-view RGB videos, and can decompose them into semanticallymeaningful parts that can be tracked and animated. Using a 3D autodecoderframework, paired with a keypoint estimator via a differentiable PnP algorithm,our model learns the underlying object geometry and parts decomposition in anentirely unsupervised manner. This allows it to perform 3D segmentation, 3Dkeypoint estimation, novel view synthesis, and animation. We primarily evaluatethe framework on two video datasets: VoxCeleb $256^2$ and TEDXPeople $256^2$.In addition, on the Cats $256^2$ image dataset, we show it even learnscompelling 3D geometry from still images. Finally, we show our model can obtainanimatable 3D objects from a single or few images. Code and visual resultsavailable on our project website, seehttps://snap-research.github.io/unsupervised-volumetric-animation .