Abstract
Federated learning (FL) is now recognized as a key framework forcommunication-efficient collaborative learning. Most theoretical and empiricalstudies, however, rely on the assumption that clients have access topre-collected data sets, with limited investigation into scenarios whereclients continuously collect data. In many real-world applications,particularly when data is generated by physical or biological processes, clientdata streams are often modeled by non-stationary Markov processes. Unlikestandard i.i.d. sampling, the performance of FL with Markovian data streamsremains poorly understood due to the statistical dependencies between clientsamples over time. In this paper, we investigate whether FL can still supportcollaborative learning with Markovian data streams. Specifically, we analyzethe performance of Minibatch SGD, Local SGD, and a variant of Local SGD withmomentum. We answer affirmatively under standard assumptions and smoothnon-convex client objectives: the sample complexity is proportional to theinverse of the number of clients with a communication complexity comparable tothe i.i.d. scenario. However, the sample complexity for Markovian data streamsremains higher than for i.i.d. sampling.