CMU-CS-02-193
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-02-193

Adventures in Ultra-Small-Space Clustering

Adam Meyerson, Liadan O'Callaghan*, Serge Plotkin*

December 2002

CMU-CS-02-193.ps
CMU-CS-02-193.pdf


Keywords: Algorithms, approximation, clustering, sampling, k-median, facility location


We give a sampling-based algorithm for the k-Median problem, with running time where [.....] k is the desired cluster number and is a confidence parameter. This is the first k-Median algorithm with fully polynominial running time that is independent of n, the size of the data set. It gives a solution that is, with high probability, an O(1) approximation if each cluster in the optimal solution has [....] points. We give near-matching lower bounds showing that this assumption about cluster size is necessary. We also present a related algorithm for finding a clustering that excludes a small number of outliers.

13 pages

* Department of Computer Science, Stanford University


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu