We present an end-to-end trained memory system that quickly adapts to newdata and generates samples like them. Inspired by Kanerva's sparse distributedmemory, it has a robust distributed reading and writing mechanism. The memoryis analytically tractable, which enables optimal on-line compression via aBayesian update-rule. We formulate it as a hierarchical conditional generativemodel, where memory provides a rich data-dependent prior distribution.Consequently, the top-down memory and bottom-up perception are combined toproduce the code representing an observation. Empirically, we demonstrate thatthe adaptive memory significantly improves generative models trained on boththe Omniglot and CIFAR datasets. Compared with the Differentiable NeuralComputer (DNC) and its variants, our memory model has greater capacity and issignificantly easier to train.