CMU-ML-19-106
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-19-106

Anytime Prediction and Learning for
the Balance between Computation and Accuracy

Hanzhang Hu

April 2019

Ph.D. Thesis

CMU-ML-19-106.pdf


Keywords: NA


When choosing machine learning algorithms, one often has to balance between two opposing factors, the computational speed and the accuracy of predictors. The trade-off during testing is often difficult to balance, because the test-time computational budget may be agnostic at training, and the budget may vary during testing. Analogously, given a novel data-set, one often lacks prior knowledge in the appropriate predictor complexity and training computation, and furthermore, may want to interrupt or prolong the training based on training results.

In this work, we address these trade-offs between computation and accuracy via anytime prediction and learning, which are algorithms that can be interrupted at any time and still produce valid solutions. Furthermore, the quality of the results improves with the consumed computation before interruption. With the versatility to adjust to any budget, anytime algorithms automatically utilize any agnostic computational budget to the maximum extent.

To address the test-time trade-off, we study anytime predictors, whose prediction computation can be interrupted during testing. We start with developing provably near-optimal anytime linear predictors, and derive a theoretical performance limitation for anytime predictors that are based on ensemble methods. Then we develop practical anytime predictions within individual neural networks via multi-objective optimization. Furthermore, leveraging these anytime predictors as weak learners, we circumvent the performance limitation on ensemble-based anytime predictors.

For the train-time trade-off, we consider the neural architecture search problem, where one seeks the optimal neural network structure for a data-set. We draw a parallel between this bi-level combinatorial optimization problem and the feature selection problem for linear prediction, and develop an iterative network growth algorithm that is inspired by a forward selection algorithm. We also consider the problem of training on large data-sets, and develop no-regret gradient boosting algorithms for stochastic data streams.

124 pages

Thesis Committee:
J. Andrew Bagnell (Co-Chair)
Martial Hebert (Co-Chair)
Ruslan Salakhutdinov
Rich Caruana (Stanford University)

Roni Rosenfeld, Head, Machine Learning Department
Tom M. Mitchell, Interim Dean, School of Computer Science


SCS Technical Report Collection
School of Computer Science