Recurrent neural network grammars (RNNG) are generative models of languagewhich jointly model syntax and surface structure by incrementally generating asyntax tree and sentence in a top-down, left-to-right order. Supervised RNNGsachieve strong language modeling and parsing performance, but require anannotated corpus of parse trees. In this work, we experiment with unsupervisedlearning of RNNGs. Since directly marginalizing over the space of latent treesis intractable, we instead apply amortized variational inference. To maximizethe evidence lower bound, we develop an inference network parameterized as aneural CRF constituency parser. On language modeling, unsupervised RNNGsperform as well their supervised counterparts on benchmarks in English andChinese. On constituency grammar induction, they are competitive with recentneural language models that induce tree structures from words through attentionmechanisms.