Memory-based meta-learning is a technique for approximating Bayes-optimalpredictors. Under fairly general conditions, minimizing sequential predictionerror, measured by the log loss, leads to implicit meta-learning. The goal ofthis work is to investigate how far this interpretation can be realized bycurrent sequence prediction models and training regimes. The focus is onpiecewise stationary sources with unobserved switching-points, which arguablycapture an important characteristic of natural language and action-observationsequences in partially observable environments. We show that various types ofmemory-based neural models, including Transformers, LSTMs, and RNNs can learnto accurately approximate known Bayes-optimal algorithms and behave as ifperforming Bayesian inference over the latent switching-points and the latentparameters governing the data distribution within each segment.