The Variational Auto-Encoder (VAE) is a simple, efficient, and popular deepmaximum likelihood model. Though usage of VAEs is widespread, the derivation ofthe VAE is not as widely understood. In this tutorial, we will provide anoverview of the VAE and a tour through various derivations and interpretationsof the VAE objective. From a probabilistic standpoint, we will examine the VAEthrough the lens of Bayes' Rule, importance sampling, and thechange-of-variables formula. From an information theoretic standpoint, we willexamine the VAE through the lens of lossless compression and transmissionthrough a noisy channel. We will then identify two common misconceptions overthe VAE formulation and their practical consequences. Finally, we willvisualize the capabilities and limitations of VAEs using a code example (withan accompanying Jupyter notebook) on toy 2D data.