Computer Science Department
School of Computer Science, Carnegie Mellon University
A modern practitioner of machine learning must often consider trade-offs between accuracy and complexity when selecting from available machine learning algorithms. Prediction tasks can range from requiring real-time performance to being largely unconstrained in their use of computational resources. In each setting, an ideal algorithm utilizes as much of the available computation as possible to provide the most accurate result.
This issue is further complicated by applications where the computational constraints are not fixed in advance. In many applications predictions are often needed in time to allow for adaptive behaviors which respond to real-time events. Such constraints often rely on a number of factors at prediction time, making it difficult to select a fixed prediction algorithm a priori. In these situations, an ideal approach is to use an anytime prediction algorithm. Such an algorithm rapidly produces an initial prediction and then continues to refine the result as time allows, producing final results which dynamically improve to fit any computational budget.
Our approach uses a greedy, cost-aware extension of boosting which fuses the disparate areas of functional gradient descent and greedy sparse approximation algorithms. By using a cost-greedy selection procedure our algorithms provide an intuitive and effective way to trade-off computational cost and accuracy for any computational budget. This approach learns a sequence of predictors to apply as time progresses, using each new result to update and improve the current prediction as time allows. Furthermore, we present theoretical work in the different areas we have brought together, and show that our anytime approach is guaranteed to achieve near-optimal performance with respect to unknown prediction time budgets. We also present the results of applying our algorithms to a number of problem domains such as classification and object detection that indicate that our approach to anytime prediction is more efficient than trying to adapt a number of existing methods to the anytime prediction problem.
We also present a number of contributions in areas related to our primary focus. In the functional gradient descent domain, we present convergence results for smooth objectives, and show that for non-smooth objectives the widely used approach fails both in theory and in practice. To rectify this we present new algorithms and corresponding convergence results for this domain. We also present novel, time-based versions of a number of greedy feature selection algorithms and give corresponding approximation guarantees for the performance of these algorithms.