CMU-ML-08-116
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-08-116

ShatterPlots: Fast Tool for Mining Large Graphs

Ana Paula Appel*, Deepayan Chakrabarti**, Christos Faloutsos,
Ravi Kumar**, Jure Leskovec, Andrew Tomkins**

December 2008

CMU-ML-08-116.pdf


Keywords: Algorithms, graph mining, percolation, edges deletion


Graphs appear in several settings, like social networks, recommendation systems, and numerous more. A deep, recurring question is "How do real graphs look like?" That is, how can we separate real graphs from synthetic or real graphs with masked portions? The main contribution of this paper is ShatterPlots, a simple and powerful algorithm to tease out patterns of real graphs that help us spot fake/masked graphs. The idea is to shatter a graph, by deleting edges, force it to reach a critical ("Shattering") point, and study the properties at that point. One of our most discriminative patterns is the "NodeShatteringRatio": that can almost perfectly separate the real from the synthetic graphs of our extensive collection. Additional contributions of this paper are (a) the careful, scalable design of the algorithm that needs only O(E) time, (b) extensive experiments on a large collection of graphs (19 in total), with up to hundred of thousand of nodes and million edges; and (c) a wealth of observations and patterns, which show how to distinguish synthetic or masked graphs from real ones.

28 pages

*ICMC - USP São Carlos, Brazil
**Yahoo! Research


SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu