CMU-ML-20-104
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-20-104

Towards Efficient Automated Machine Learning

Liam Li

May 2020

Ph.D. Thesis

CMU-ML-20-104.pdf


Keywords: AutoML, Hyperparameter Optimization, Neural Architecture Search


Machine learning is widely used in a variety of different disciplines to develop predictive models for variables of interest. However, building such solutions is a time consuming and challenging discipline that requires highly trained data scientists and domain experts. In response, the field of automated machine learning (AutoML) aims to reduce human effort and speedup the development cycle through automation.

Due to the ubiquity of hyperparameters in machine learning algorithms and the impact that a well-tuned hyperparameter configuration can have on predictive performance, hyperparameter optimization is a core problem in AutoML. More recently, the rise of deep learning has motivated neural architecture search (NAS), a specialized instance of a hyperparameter optimization problem focused on automating the design of neural networks. Naive approaches to hyperparameter optimization like grid search and random search are computationally intractable for large scale tuning problems. Consequently, this thesis focuses on developing efficient and principled methods for hyperparameter optimization and NAS.

In particular, we make progress towards answering the following questions with the aim of developing algorithms for more efficient and effective automated machine learning:

1. Hyperparameter Optimization

  1. (a) How can we effectively use early-stopping to speed up hyperparameter optimization?
  2. (b) How can we exploit parallel computing to perform hyperparameter optimization in the same time it takes to train a single model in the sequential setting?
  3. (c) For multi-stage machine learning pipelines, how can we exploit the structure of the search space to reduce total computational cost?

2. Neural Architecture Search

  1. (a) What is the gap in performance between state-of-the-art weight-sharingNAS methods and random search baselines?
  2. (b) How can we develop more principled weight-sharing methods with provablyfaster convergence rates and improved empirical performance?
  3. (c) Does the weight-sharing paradigm commonly used in NAS have applica-tions to more general hyperparameter optimization problems?

Given these problems, this thesis is organized into two parts. The first part focuses on progress we have made towards efficient hyperparameter optimization by addressing Problems 1a, 1b, and 1c. The second part focuses on progress we have made towards understanding and improving weight-sharing for neural architecture search and beyond by addressing Problems 2a, 2b, and 2c. 184 pages

Thesis Committee:
Ameet Talwalkar (Chair)
Maria-Florina Balcan
Jeff Schneider
Kevin Kamieson (University of Washington)

Roni Rosenfeld, Head, Machine Learning Department
Martial Hebert, Dean, School of Computer Science


SCS Technical Report Collection
School of Computer Science