Machine Learning Department
School of Computer Science, Carnegie Mellon University
Uncovering Structure in High-Dimensions:
Extracting knowledge and providing insights into complex mechanisms underlying
noisy high-dimensional data sets is of utmost importance in many scientific
domains. Statistical modeling has become ubiquitous in the analysis of high dimensional
functional data in search of better understanding of cognition mechanisms, in
the exploration of large-scale gene regulatory networks in hope of developing drugs
for lethal diseases, and in prediction of volatility in stock market in hope of beating
the market. Statistical analysis in these high-dimensional data sets is possible only
if an estimation procedure exploits hidden structures underlying data.
This thesis develops flexible estimation procedures with provable theoretical guarantees for uncovering unknown hidden structures underlying data generating process. Of particular interest are procedures that can be used on high dimensional data sets where the number of samples n is much smaller than the ambient dimension p. Learning in high-dimensions is difficult due to the curse of dimensionality, however, the special problem structure makes inference possible. Due to its importance for scientific discovery, we put emphasis on consistent structure recovery throughout the thesis. Particular focus is given to two important problems, semi-parametric estimation of networks and feature selection in multi-task learning.
||SCS Technical Report Collection
School of Computer Science