|
CMU-CS-15-126
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-15-125
Exploring and Making Sense of Large Graphs
Danai Koutra
August 2015
Ph.D. Thesis
CMU-CS-15-126.pdf
Keywords:
Data mining, graph mining and exploration, understanding graphs, graph similarity, graph matching, network alignment, graph summarization, compression, pattern mining, outlier detection, anomaly detection, attribution, culprits, scalability, fast algorithms, models, visualization, social networks, brain graphs, connectomes, VoG, FaBP, DeltaCon, DeltaCon-Attr, TimeCrunch, BiG-Align, Uni-Align
Graphs naturally represent information ranging from links between webpages, to
friendships in social networks, to connections between neurons in our brains.
These graphs often span billions of nodes and interactions between
them. Within this deluge of interconnected data, how can we find the most
important structures and summarize them? How can we efficiently visualize
them? How can we detect anomalies that indicate critical events, such as an
attack on a computer system, disease formation in the human brain, or the
fall of a company?
To gain insights into these problems, this thesis focuses on developing
scalable, principled discovery algorithms that combine globality with locality
to make sense of one or more graphs. In addition to our fast
algorithmic methodologies, we also contribute graph-theoretical ideas and models, and real-world applications in two main areas:
- Single-Graph Exploration: We show how to interpretably
summarize a single graph by identifying its important graph structures. We complement summarization with inference, which leverages information about few entities (obtained via summarization or other methods) and the
network structure to efficiently and effectively learn information about
the unknown entities.
- Multiple-Graph Exploration: We extend the idea of single-graph
summarization to time-evolving graphs, and show how to scalably discover
temporal patterns. Apart from summarization, we claim that graph similarity
is often the underlying problem in a host of applications where
multiple graphs occur (e.g., temporal anomaly detection, discovery of
behavioral patterns), and we present principled, scalable algorithms for
aligning networks and measuring their similarity.
We leverage techniques from diverse areas, such as matrix algebra, graph
theory, optimization, information theory, machine learning, finance, and social science, to solve real-world problems. We have applied our exploration
algorithms to massive datasets, including a Web graph of 6.6 billion
edges, a Twitter graph of 1.8 billion edges, brain graphs with up to
90 million edges,
collaboration, peer-to-peer networks, browser logs, all spanning millions of
users and interactions.
230 pages
Thesis Committee:
Christos Faloutsos (Chair)
William Cohen
Roni Rosenfeld
Eric Horvitz (Microsoft Research, Redmond)
Frank Pfenning, Head, Computer Science Department
Andrew W. Moore, Dean, School of Computer Science
|