CMU-ML-10-112
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-10-112

Nonparametric Learning in High Dimensions

Han Liu

December 2010

Ph.D. Thesis

CMU-ML-10-112.pdf


Keywords: Machine learning, statistical inference, nonparametric methods, curse of dimensionality, regression, classification, multi-task learning, density estimation, undirected graphical models, structure learning, spatial-temporal adaptive learning

This thesis develops flexible and principled nonparametric learning algorithms to explore, understand, and predict high dimensional and complex datasets. Such data appear frequently in modern scientific domains and lead to numerous important applications. For example, exploring high dimensional functional magnetic resonance imaging data helps us to better understand brain functionalities; inferring large-scale gene regulatory network is crucial for new drug design and development; detecting anomalies in high dimensional transaction databases is vital for corporate and government security.

Our main results include a rigorous theoretical framework and efficient nonparametric learning algorithms that exploit hidden structures to overcome the curse of dimensionality when analyzing massive high dimensional datasets. These algorithms have strong theoretical guarantees and provide high dimensional nonparametric recipes for many important learning tasks, ranging from unsupervised exploratory data analysis to supervised predictive modeling. In this thesis, we address three aspects:

1. Understanding the statistical theories of high dimensional nonparametric inference, including risk, estimation, and model selection consistency;
2. Designing new methods for different data-analysis tasks, including regression, classification, density estimation, graphical model learning, multi-task learning, spatial-temporal adaptive learning;
3. Demonstrating the usefulness of these methods in scientific applications, including functional genomics, cognitive neuroscience, and meteorology.

In the last part of this thesis, we also present the future vision of high dimensional and large-scale nonparametric inference.

305 pages


SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu