Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles Between Client Data Subspaces

Abstract

Clustered federated learning (FL) has been shown to produce promising resultsby grouping clients into clusters. This is especially effective in scenarioswhere separate groups of clients have significant differences in thedistributions of their local data. Existing clustered FL algorithms areessentially trying to group together clients with similar distributions so thatclients in the same cluster can leverage each other's data to better performfederated learning. However, prior clustered FL algorithms attempt to learnthese distribution similarities indirectly during training, which can be quitetime consuming as many rounds of federated learning may be required until theformation of clusters is stabilized. In this paper, we propose a new approachto federated learning that directly aims to efficiently identify distributionsimilarities among clients by analyzing the principal angles between the clientdata subspaces. Each client applies a truncated singular value decomposition(SVD) step on its local data in a single-shot manner to derive a small set ofprincipal vectors, which provides a signature that succinctly captures the maincharacteristics of the underlying distribution. This small set of principalvectors is provided to the server so that the server can directly identifydistribution similarities among the clients to form clusters. This is achievedby comparing the similarities of the principal angles between the client datasubspaces spanned by those principal vectors. The approach provides a simple,yet effective clustered FL framework that addresses a broad range of dataheterogeneity issues beyond simpler forms of Non-IIDness like label skews. Ourclustered FL approach also enables convergence guarantees for non-convexobjectives. Our code is available at https://github.com/MMorafah/PACFL.

Quick Read (beta)

loading the full paper ...