Recent language models can generate interesting and grammatically correcttext in story generation but often lack plot development and long-termcoherence. This paper experiments with a latent vector planning approach basedon a TD-VAE (Temporal Difference Variational Autoencoder), using the model forconditioning and reranking for text generation. The results demonstrate strongperformance in automatic cloze and swapping evaluations. The human judgmentsshow stories generated with TD-VAE reranking improve on a GPT-2 medium baselineand show comparable performance to a hierarchical LSTM reranking model.Conditioning on the latent vectors proves disappointing and deterioratesperformance in human evaluation because it reduces the diversity of generation,and the models don't learn to progress the narrative. This highlights animportant difference between technical task performance (e.g. cloze) andgenerating interesting stories.