Center for Automated Learning and Discovery
School of Computer Science, Carnegie Mellon University
Tools for Large Graph Mining
We attempt to answer these questions in two parts. First, we answer questions targeted at applications: what patterns/properties of a graph are important for solving specific problems? Here, we investigate the propagation behavior of a computer virus over a network, and find a simple formula for the epidemic threshold (beyond which any viral outbreak might become an epidemic). We find an "information survival threshold" which determines whether, in a sensor or P2P network with failing nodes and links, a piece of information will survive or not. We also develop a scalable, parameter-free method for finding groups of "similar" nodes in a graph, corresponding to homogeneous regions (or CrossAssociations) in the binary adjacency matrix of the graph. This can help navigate the structure of the graph, and find un-obvious patterns.
In the second part of our work, we investigate recurring patterns in real-world graphs, to gain a deeper understanding of their structure. This leads to the development of the R-MAT model of graph generation for creating synthetic but "realistic" graphs, which match many of the patterns found in real-world graphs, including power-law and lognormal degree distributions, small diameter and "community" effects.
||SCS Technical Report Collection
School of Computer Science homepage
This page maintained by firstname.lastname@example.org