CMU-CS-24-145
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-24-145

The learning of algorithms and architectures

Mikhail Khodak

Ph.D. Thesis

August 2024

CMU-CS-24-145.pdf
Pending


Keywords: Machine learning, meta-learning, algorithms with predictions, architecture search

How should we design the algorithms we run and the architectures we learn? Several high-impact areas of computing have begun to automate these procedures using machine learning (ML), reducing the need for human effort by using our expanding amount of data and compute. This thesis uses ideas from ML, algorithm design, and optimization to advance our understanding of these areas of data-driven computing—meta-learning, algorithms with predictions, and architecture search–and to translate the resulting methodologies into state-of-the-art implementations.

  • In meta-learning, which uses ML itself to improve ML algorithms by learning across many learning tasks, we introduce ARUBA, a framework for designing and analyzing meta-learning methods. Our analysis yields the first guarantees for gradient-based meta-learning, showing how such methods improve performance based upon quantifiable measures of similarity between learning tasks. We use ARUBA to extend the practical impact of meta-learning to new areas of ML, including to learning with partial feedback and to federated learning; in the latter case, we introduce FedEx, a new state-of-the-art method for tuning federated optimizers, which train models on networks of distributed heterogeneous datasets such as mobile devices and hospital records.
  • We build upon the success of ARUBA by taking its core approach–the optimization of surrogate loss functions approximating algorithmic objectives–and extending it beyond learning algorithms to show learning guarantees for algorithms with predictions, which are algorithms that take advantage of ML predictions about their instances; in particular, we show the first learning theoretic guarantees for predictions that depend on the instance the algorithm is run on, a crucial property for practical applications. Our framework again serves as an algorithm design tool, which we use to build the first algorithms with predictions for mechanisms that release (differentially) private statistics about sensitive datasets and for linear system solvers; in the latter case, we design learning algorithms that, under natural structural assumptions, can learn to make instance-optimal predictions.
  • Lastly, this thesis addresses the problem of finding neural network architec tures to train on specific learning tasks, or architecture search, where we make progress towards understanding the optimization and generalization properties of weight-sharing, a dominant heuristic used throughout the field. We then extend weight-sharing to design new search spaces based around neural operations that allow for the automated discovery of truly novel architectures from data; the culmination of this effort is DASH, a method that efficiently finds architectures that outperform human expert-designed neural architectures on the majority of diverse tasks we test.

    420 pages

    Thesis Committee:
    Maria-Florina Balcan (Co-Chair)
    Ameet Talwalkar (Co-Chair)
    Tom Mithcell
    Peter Bartlett (University of California, Berkeley)
    Alexander Smola (Boson AI)

    Srinivasan Seshan, Head, Computer Science Department
    Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu