Affiliation(s): College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China; School of Public Affairs, Zhejiang University, Hangzhou 310027, China; Tongdun Technology, Hangzhou 310000, China; Institute of Basic Medicine and Cancer, Chinese Academy of Sciences, Hangzhou 310018, China; ElasticMind.AI Technology Inc., Hangzhou 310018; less
Received: 2022-06-21
Accepted: 2023-08-29
Available online: 2023-08-29
Abstract
To leverage the enormous amount of unlabeled data on distributed edge devices, we formulate a new problem in called federated unsupervised (FURL) to learn a common representation model without supervision while preserving data privacy. FURL poses two new challenges: (1) data distribution shift (non-independent and identically distributed, non-IID) among clients would make local models focus on different categories, leading to the inconsistency of representation spaces; (2) without unified information among the clients in FURL, the representations across clients would be misaligned. To address these challenges, we propose the federated contrastive averaging with dictionary and alignment (FedCA) algorithm. FedCA is composed of two key modules: a dictionary module to aggregate the representations of samples from each client which can be shared with all clients for consistency of representation space and an alignment module to align the representation of each client on a base model trained on public data. We adopt the contrastive approach for local model training. Through extensive experiments with three evaluation protocols in IID and non-IID settings, we demonstrate that FedCA outperforms all baselines with significant margins.