Learning long-term dependencies in extended temporal sequences requirescredit assignment to events far back in the past. The most common method fortraining recurrent neural networks, back-propagation through time (BPTT),requires credit information to be propagated backwards through every singlestep of the forward computation, potentially over thousands or millions of timesteps. This becomes computationally expensive or even infeasible when used withlong sequences. Importantly, biological brains are unlikely to perform suchdetailed reverse replay over very long sequences of internal states (considerdays, months, or years.) However, humans are often reminded of past memories ormental states which are associated with the current mental state. We considerthe hypothesis that such memory associations between past and present could beused for credit assignment through arbitrarily long sequences, propagating thecredit assigned to the current state to the associated past state. Based onthis principle, we study a novel algorithm which only back-propagates through afew of these temporal skip connections, realized by a learned attentionmechanism that associates current states with relevant past states. Wedemonstrate in experiments that our method matches or outperforms regular BPTTand truncated BPTT in tasks involving particularly long-term dependencies, butwithout requiring the biologically implausible backward replay through thewhole history of states. Additionally, we demonstrate that the proposed methodtransfers to longer sequences significantly better than LSTMs trained with BPTTand LSTMs trained with full self-attention.