Machine Learning Department
School of Computer Science, Carnegie Mellon University
Data Mining Meets HCI:
Making Sense of Large Graphs
Duen Horng (Polo) Chau
Graph Mining, Data Mining, Machine Learning, Human-Computer
Interaction, HCI, Graphical Models, Inference, Big Data, Sensemaking,
Visualization, eBay Auction Fraud Detection, Symantec Malware Detection,
Belief Propagation, Random Walk, Guilt by Association, Polonium, NetProbe,
Apolo, Feldspar, Graphite
We have entered the age of big data. Massive datasets are now
common in science, government and enterprises. Yet, making sense
of these data remains a fundamental challenge. Where do we start our
analysis? Where to go next? How to visualize our findings?
We answers these questions by bridging Data Mining and Human-
Computer Interaction (HCI) to create tools for making sense of graphs
with billions of nodes and edges, focusing on:
(1) Attention Routing: we introduce this idea, based on anomaly
detection, that automatically draws people’s attention to interesting
areas of the graph to start their analyses. We present three examples:
Polonium unearths malware from 37 billion machine-file relationships;
NetProbe fingers bad guys who commit auction fraud.
(2) Mixed-Initiative Sensemaking: we present two examples
that combine machine inference and visualization to help users locate
next areas of interest: Apolo guides users to explore large graphs
by learning from few examples of user interest; Graphite finds interesting
subgraphs, based on only fuzzy descriptions drawn graphically.
(3) Scaling Up: we show how to enable interactive analytics of
large graphs by leveraging Hadoop, staging of operations, and approximate
This thesis contributes to data mining, HCI, and importantly
their intersection, including: interactive systems and
algorithms that scale;
theories that unify graph mining approaches; and paradigms
that overcome fundamental challenges in visual analytics.
Our work is making impact to academia and society: Polonium
protects 120 million people worldwide from malware; NetProbe made
headlines on CNN, WSJ and USA Today; Pegasus won an opensource
software award; Apolo helps DARPA detect insider threats
and prevent exfiltration.
We hope our Big Data Mantra "Machine for Attention Routing,
Human for Interaction" will inspire more innovations at the crossroad
of data mining and HCI.