Computer Science Department
School of Computer Science, Carnegie Mellon University


Tri-Plots: Scalable Tools for Multidimensional Data Mining

Agma Traina*, Caetano Traina*, Spiros Papadimitriou, Christos Faloutsos

July 2001

Keywords: Data mining, multidimensional, box count

We focus on the problem of finding patterns across two large, multidimensional datasets. For example, given feature vectors of healthy and of non-healthy patients, we want to answer the following questions: Are the two clouds of points separable? What is the smallest/largest pair-wise distance across the two datasets? Which of the two clouds does a new point (feature vector) come from?

We propose a new tool, the tri-plot, and its generalization, the pq-plot, which help us answer the above questions. We provide a set of rules on how to interpret a tri-plot, and we apply these rules on synthetic and real datasets. We also show how to use our tool for classification, when traditional methods (nearest neighbor, classification trees) may fail.

20 pages

* Visiting from the Department of Computer Science and Statistics, University of S. Paulo at S. Carlos, Brazil.

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by