Electronics, Vol. 12, Pages 1660: A Clustered Federated Learning Method of User Behavior Analysis Based on Non-IID Data
Electronics doi: 10.3390/electronics12071660
Authors: Jianfei Zhang Zhongxin Li
Federated learning (FL) is a novel distributed machine learning paradigm. It can protect data privacy in distributed machine learning. Hence, FL provides new ideas for user behavior analysis. User behavior analysis can be modeled using multiple data sources. However, differences between different data sources can lead to different data distributions, i.e., non-identically and non-independently distributed (Non-IID). Non-IID data usually introduce bias in the training process of FL models, which will affect the model accuracy and convergence speed. In this paper, a new federated learning algorithm is proposed to mitigate the impact of Non-IID data on the model, named federated learning with a two-tier caching mechanism (FedTCM). First, FedTCM clustered similar clients based on their data distribution. Clustering reduces the extent of Non-IID between clients in a cluster. Second, FedTCM uses asynchronous communication methods to alleviate the problem of inconsistent computation speed across different clients. Finally, FedTCM sets up a two-tier caching mechanism on the server for mitigating the Non-IID data between different clusters. In multiple simulated datasets, compared to the method without the federated framework, the FedTCM is maximum 15.8% higher than it and average 12.6% higher than it. Compared to the typical federated method FedAvg, the accuracy of FedTCM is maximum 2.3% higher than it and average 1.6% higher than it. Additionally, FedTCM achieves more excellent communication performance than FedAvg.