CMU-CS-23-127
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-23-127

Scalable and Trustworthy Learning in
Heterogeneous Networks

Tian Li

Ph.D. Thesis

August 2023

CMU-CS-23-127.pdf


Keywords: Distributed optimization, trustworthy learning, federated learning

Developing machine learning models heavily relies on access to data. To build a responsible data economy and protect data ownership, it is crucial to enable learning models from separate, heterogeneous data sources without centralization. Federated learning (FL) aims to train models collectively across massive remote devices or isolated organizations, while keeping user data local. However, federated networks introduce a number of challenges beyond traditional distributed learning scenarios. While FL has shown great promise for enabling edge applications, current FL systems are hindered by several constraints. In addition to being accurate, federated methods must scale to potentially massive and heterogeneous networks of devices, and must exhibit trustworthy behavior–addressing pragmatic concerns related to issues such as fairness, robustness, and user privacy.

In this thesis, we aim to address the practical challenges of federated learn- ing in a principled fashion. We study how heterogeneity lies at the center of the constraints of federated learning–not only affecting the accuracy of the models, but also competing with other critical metrics such as fairness, robustness, and privacy. To address these metrics, we develop new, scalable learning objectives and algorithms that rigorously account for and address sources of heterogeneity. In particular, in terms of accuracy, we propose novel federated optimization frameworks with convergence guarantees under real- istic heterogeneity assumptions. In terms of trustworthiness, we develop and analyze fair learning objectives which offer flexible fairness/utility tradeoffs. We consider the joint constraints between fairness and robustness, and explore personalized FL to provably address both of them simultaneously. Finally, we study new differentially private optimization methods with improved convergence behavior, achieving state-of-the-art performance under privacy constraints.

Although our work is grounded by the application of federated learning, we show that many of the techniques and fundamental tradeoffs extend well beyond this use-case to more general applications of large-scale and trustworthy machine learning.

296 pages

Thesis Committee:
Virginia Smith (Chair)
Tianqi Chen
Ameet Talwalkar
H. Brendan McMahan (Google Research)
Dawn Song (University of California, Berkeley)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu