CMU-CS-12-124Computer Science Department School of Computer Science, Carnegie Mellon University
CMU-CS-12-124
Leman Akoglu, Jilles Vreeken*, Hanghang Tong**, August 2012
Keywords:
Link mining, connection subgraphs, sensemaking
Suppose we are given a large graph in which, by some external process, a handful of nodes are marked. What can we say about these marked nodes? Are they all close-by in the graph, or are they segregated into multiple groups? How can we automatically determine how many, if any, groups they form as well as find simple paths that connect the nodes in each group? We formalize the problem in terms of the Minimum Description Length principle: a set of paths is simple when we need few bits to describe each path from one node to another. For example, we want to avoid high-degree nodes, unless we need to visit many of its spokes. As such, the best partitioning requires the least number of bits to describe the paths that visit all marked nodes. We show that our formulation for finding simple paths between groups of nodes has connections to well-known other problems in graph theory, and is NP-hard. We propose fast effective solutions, and introduce DOT2DOT, an efficient algorithm for partitioning marked nodes as well as finding simple paths between nodes within parts. Experimentation shows DOT2DOT correctly groups nodes for which good connection paths can be constructed, while separating distant nodes. 27 pages
*University of Antwerp
| |

Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |