Deep reinforcement learning can match and exceed human performance, but ifeven minor changes are introduced to the environment artificial networks oftencan't adapt. Humans meanwhile are quite adaptable. We hypothesize that this ispartly because of how humans use heuristics, and partly because humans canimagine new and more challenging environments to learn from. We've developed amodel of hierarchical reinforcement learning that combines both these elementsinto a stumbler-strategist network. We test transfer performance of thisnetwork using Wythoff's game, a gridworld environment with a known optimalstrategy. We show that combining imagined play with a heuristic--labeling eachposition as "good" or "bad"'--both accelerates learning and promotes transferto novel games, while also improving model interpretability.