In this paper we present, to the best of our knowledge, the first method tolearn a generative model of 3D shapes from natural images in a fullyunsupervised way. For example, we do not use any ground truth 3D or 2Dannotations, stereo video, and ego-motion during the training. Our approachfollows the general strategy of Generative Adversarial Networks, where an imagegenerator network learns to create image samples that are realistic enough tofool a discriminator network into believing that they are natural images. Incontrast, in our approach the image generation is split into 2 stages. In thefirst stage a generator network outputs 3D objects. In the second, adifferentiable renderer produces an image of the 3D objects from randomviewpoints. The key observation is that a realistic 3D object should yield arealistic rendering from any plausible viewpoint. Thus, by randomizing thechoice of the viewpoint our proposed training forces the generator network tolearn an interpretable 3D representation disentangled from the viewpoint. Inthis work, a 3D representation consists of a triangle mesh and a texture mapthat is used to color the triangle surface by using the UV-mapping technique.We provide analysis of our learning approach, expose its ambiguities and showhow to overcome them. Experimentally, we demonstrate that our method can learnrealistic 3D shapes of faces by using only the natural images of the FFHQdataset.