CMU-ML-18-110
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-18-110

Tuning Hyperparameters without Grad Students:
Scaling up Bandit Optimisation

Kirthevasan Kandasamy

October 2018

Ph.D. Thesis

CMU-ML-18-110.pdf


Keywords: Decision making under uncertainty, Bandits, Bandit Optimisation, Bayesian Optimisation


This thesis explores scalable methods for adaptive decision making under uncertainty in stateless environments, where the goal of an agent is to design an experiment, observe the outcome, and plan subsequent experiments so as to achieve a desired goal. Typically, each experiment incurs a large computational or economic cost, and we need to keep the number of experiments to a minimum. Many of such problems fall under the bandit framework, where the outcome of each experiment can be viewed as a reward signal, and the goal is to optimise for this reward, i.e. find the design that maximises this reward. A common use case for bandits, pervasive in many industrial and scientific applications, is hyperparameter tuning, where we need to find the optimal configuration of a black box system by tuning the several knobs which affect the performance of the system. Some applications include statistical model selection, materials design, optimal policy selection in robotics, and maximum likelihood inference in simulation based scientific models. More generally, bandits are but one class of problems studied under the umbrella of adaptive decision making under uncertainty in stateless environments. Problems such as active learning and design of experiments are other examples of adaptive decision making, but unlike bandits, progress towards a desired goal is not made known to the agent via a reward signal.

With increasingly expensive experiments and demands to optimise over complex input spaces, bandit optimisation tasks face new challenges today. At the same time, there are new opportunities that have not been exploited previously. We study the following questions in this thesis so as to enable the application of bandit and more broadly adaptive decision making methods to modern real world applications.

- Conventional bandit methods work reliably in low dimensional settings, but scale poorly with input dimensionality. Scaling such methods to high dimensional domains requires addressing several computational and statistical challenges.
- In many applications, an expensive experiment can be cheaply approximated. We study techniques that can use information from these cheap lower fidelity approximations to speed up the overall optimisation process.
- Conventional bandit methods are inherently sequential. We study parallelisation techniques so as to deploy several experiments at the same time.
- Typical methods assume that a design can be characterised by a Euclidean vector. We study bandit methods on graph-structured spaces. As a specific application, we study neural architecture search, which optimises for the structure of the neural network by viewing it as a directed graph with node labels and node weights.
- Current methods for adaptive data collection are designed for specific tasks, and have limited applicability in problems with complex and application specific goals. We study a general framework for sequential design of experiments which allows one to specify their goal and incorporate other domain expertise.

We first delve into the above topics in the bandit framework and then study how they can be extended to broader decision making problems. We develop methods with theoretical guarantees which simultaneously enjoy good empirical performance. As part of this thesis, we also develop an open source Python framework for scalable and robust bandit optimisation.

324 pages

Thesis Committee:
Barnabás Póczos (Co-Chair)
Jeff Schneider (Co-Chair)
Aarti Singh
Zoubin Ghahramani (University of Cambridge)

Roni Rosenfeld, Head, Machine Learning Department
Andrew W. Moore, Dean, School of Computer Science


SCS Technical Report Collection
School of Computer Science