Deep latent-variable models learn representations of high-dimensional data inan unsupervised manner. A number of recent efforts have focused on learningrepresentations that disentangle statistically independent axes of variation,often by introducing suitable modifications of the objective function. Wesynthesize this growing body of literature by formulating a generalization ofthe evidence lower bound that explicitly represents the trade-offs betweensparsity of the latent code, bijectivity of representations, and coverage ofthe support of the empirical data distribution. Our objective is also suitableto learning hierarchical representations that disentangle blocks of variableswhilst allowing for some degree of correlations within blocks. Experiments on arange of datasets demonstrate that learned representations containinterpretable features, are able to learn discrete attributes, and generalizeto unseen combinations of factors.