Mobile keyboard suggestion is typically regarded as a word-level languagemodeling problem. Centralized machine learning technique requires massive userdata collected to train on, which may impose privacy concerns for sensitivepersonal typing data of users. Federated learning (FL) provides a promisingapproach to learning private language modeling for intelligent personalizedkeyboard suggestion by training models in distributed clients rather thantraining in a central server. To obtain a global model for prediction, existingFL algorithms simply average the client models and ignore the importance ofeach client during model aggregation. Furthermore, there is no optimization forlearning a well-generalized global model on the central server. To solve theseproblems, we propose a novel model aggregation with the attention mechanismconsidering the contribution of clients models to the global model, togetherwith an optimization technique during server aggregation. Our proposedattentive aggregation method minimizes the weighted distance between the servermodel and client models through iterative parameters updating while attends thedistance between the server model and client models. Through experiments on twopopular language modeling datasets and a social media dataset, our proposedmethod outperforms its counterparts in terms of perplexity and communicationcost in most settings of comparison.