CMU-CS-23-127 Computer Science Department School of Computer Science, Carnegie Mellon University
Scalable and Trustworthy Learning in Tian Li Ph.D. Thesis August 2023
Developing machine learning models heavily relies on access to data. To build a responsible data economy and protect data ownership, it is crucial to enable learning models from separate, heterogeneous data sources without centralization. Federated learning (FL) aims to train models collectively across massive remote devices or isolated organizations, while keeping user data local. However, federated networks introduce a number of challenges beyond traditional distributed learning scenarios. While FL has shown great promise for enabling edge applications, current FL systems are hindered by several constraints. In addition to being accurate, federated methods must scale to potentially massive and heterogeneous networks of devices, and must exhibit trustworthy behavior–addressing pragmatic concerns related to issues such as fairness, robustness, and user privacy. In this thesis, we aim to address the practical challenges of federated learn- ing in a principled fashion. We study how heterogeneity lies at the center of the constraints of federated learning–not only affecting the accuracy of the models, but also competing with other critical metrics such as fairness, robustness, and privacy. To address these metrics, we develop new, scalable learning objectives and algorithms that rigorously account for and address sources of heterogeneity. In particular, in terms of accuracy, we propose novel federated optimization frameworks with convergence guarantees under real- istic heterogeneity assumptions. In terms of trustworthiness, we develop and analyze fair learning objectives which offer flexible fairness/utility tradeoffs. We consider the joint constraints between fairness and robustness, and explore personalized FL to provably address both of them simultaneously. Finally, we study new differentially private optimization methods with improved convergence behavior, achieving state-of-the-art performance under privacy constraints. Although our work is grounded by the application of federated learning, we show that many of the techniques and fundamental tradeoffs extend well beyond this use-case to more general applications of large-scale and trustworthy machine learning.
296 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |