Machine Learning Department
School of Computer Science, Carnegie Mellon University


Graph Structured Normal Means Inference

James Sharpnack

May 2013

Ph.D. Thesis


Keywords: Normal means, Gaussian goodness-of-fit, graph structure, estimation, detection, localization, anomaly detection, statistical inference, network, Laplacian, spanning tree wavelets, spectral scan

This thesis addresses statistical estimation and testing of signals over a graph when measurements are noisy and high-dimensional. Graph structured patterns appear in applications as diverse as sensor networks, virology in human networks, congestion in internet routers, and advertising in social networks. We will develop asymptotic guarantees of the performance of statistical estimators and tests, by stating conditions for consistency by properties of the graph (e.g. graph spectra). The goal of this thesis is to demonstrate theoretically that by exploiting the graph structure one can achieve statistical consistency in extremely noisy conditions.

We begin with the study of a projection estimator called Laplacian eigenmaps, and find that eigenvalue concentration plays a central role in the ability to estimate graph structured patterns. We continue with the study of the edge lasso, a least squares procedure with total variation penalty, and determine combinatorial conditions under which changepoints (edges across which the underlying signal changes) on the graph are recovered. We will shift focus to testing for anomalous activations in the graph, using the scan statistic relaxations, the spectral scan statistic and the graph ellipsoid scan statistic. We will also show how one can form a decomposition of the graph from a spanning tree which will lead to a test for activity in the graph. This will lead to the construction of a spanning tree wavelet basis, which can be used to localize activations on the graph.

99 pages

SCS Technical Report Collection
School of Computer Science