CMU-ML-15-101
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-15-101

Computational and Statistical Advances
in Testing and Learning

Aaditya Kumar Ramdas

July 2015

Ph.D. Thesis*

CMU-ML-15-101.pdf


Keywords: Active learning, convex optimization, hypothesis testing, computation-statistics tradeoffs, sequential analysis, stochastic oracles, randomized algorithms


This thesis makes fundamental computational and statistical advances in testing and estimation, making critical progress in theory and application of classical statistical methods like classification, regression and hypothesis testing, and understanding the relationships between them. Our work connects multiple fields in often counter-intuitive and surprising ways, leading to new theory, new algorithms, and new insights, and ultimately to a cross-fertilization of varied fields like optimization, statistics and machine learning.

The first of three thrusts has to do with active learning, a form of sequential learning from feedback-driven queries that often has a provable statistical advantage over passive learning. We unify concepts from two seemingly different areas – active learning and stochastic first-order optimization. We use this unified view to develop new lower bounds for stochastic optimization using tools from active learning and new algorithms for active learning using ideas from optimization. We also study the effect of feature noise, or errors-in-variables, on the ability to actively learn.

The second thrust deals with the development and analysis of new convex optimization algorithms for classification and regression problems. We provide geometrical and convex analytical insights into the role of the margin in margin-based classification, and develop new greedy primal-dual algorithms for non-linear classification. We also develop a unified proof for convergence rates of randomized algorithms for the ordinary least squares and ridge regression problems in a variety of settings, with the purpose of investigating which algorithm should be utilized in different settings. Lastly, we develop fast state-of-the-art numerically stable algorithms for an important univariate regression problem called trend filtering with a wide variety of practical extensions.

The last thrust involves a series of practical and theoretical advances in nonparametric hypothesis testing. We show that a smoothed Wasserstein distance allows us to connect many vast families of univariate and multivariate two sample tests. We clearly demonstrate the decreasing power of the families of kernel-based and distance-based two-sample tests and independence tests with increasing dimensionality, challenging existing folklore that they work well in high dimensions. Surprisingly, we show that these tests are automatically adaptive to simple alternatives and achieve the same power as other direct tests for detecting mean differences. We discover a computation-statistics tradeoff, where computationally more expensive two-sample tests have a provable statistical advantage over cheaper tests. We also demonstrate the practical advantage of using Stein shrinkage for kernel independence testing at small sample sizes. Lastly, we develop a novel algorithmic scheme for performing sequential multivariate nonparametric hypothesis testing using the martingale law of the iterated logarithm to near-optimally control both type-1 and type-2 errors.

One perspective connecting everything in this thesis involves the closely related and fundamental problems of linear regression and classification. Every contribution in this thesis, from active learning to optimization algorithms, to the role of the margin, to nonparametric testing fits in this picture. An underlying theme that repeats itself in this thesis, is the computational and/or statistical advantages of sequential schemes with feedback. This arises in our work through comparing active with passive learning, through iterative algorithms for solving linear systems instead of direct matrix inversions, and through comparing the power of sequential and batch hypothesis tests.

253 pages

*Machine Learning Department, School of Computer Science
Department of Statistics, Dietrich College

Thesis Committee:
Larry Wasserman (Co-Chair)
Aarti Singh (Co-Chair)
Ryan Tibshirani
Michael Jordan (University of California, Berkeley)
Arthur Gretton (University College London)

Tom M. Mitchell, Head, Machine Learning Department
Andrew W. Moore, Dean, School of Computer Science


SCS Technical Report Collection
School of Computer Science