CMU-ML-13-106
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-13-106

Uncovering Structure in High-Dimensions:
Networks and Multi-task Learning Problems

Mladen Kolar

July 2013

Ph.D. Thesis

CMU-ML-13-106.pdf


Keywords: Complex Systems, Dynamic Networks, Feature Selection, Gaussian Graphical Models, High-dimensional Inference, Markov Random Fields, Multi-task Learning, Semiparametric Estimation, Sparsity, Structure Learning, Undirected Graphical Models, Variable Screening, Varying Coefficient


Extracting knowledge and providing insights into complex mechanisms underlying noisy high-dimensional data sets is of utmost importance in many scientific domains. Statistical modeling has become ubiquitous in the analysis of high dimensional functional data in search of better understanding of cognition mechanisms, in the exploration of large-scale gene regulatory networks in hope of developing drugs for lethal diseases, and in prediction of volatility in stock market in hope of beating the market. Statistical analysis in these high-dimensional data sets is possible only if an estimation procedure exploits hidden structures underlying data.

This thesis develops flexible estimation procedures with provable theoretical guarantees for uncovering unknown hidden structures underlying data generating process. Of particular interest are procedures that can be used on high dimensional data sets where the number of samples n is much smaller than the ambient dimension p. Learning in high-dimensions is difficult due to the curse of dimensionality, however, the special problem structure makes inference possible. Due to its importance for scientific discovery, we put emphasis on consistent structure recovery throughout the thesis. Particular focus is given to two important problems, semi-parametric estimation of networks and feature selection in multi-task learning.

375 pages


SCS Technical Report Collection
School of Computer Science