|
CMU-CS-04-108
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-04-108
A Statistical Approach to Algorithmic Analysis
of High-Dimensional Nearest-Neighbor Search
Alexander Gray
February 2004
CMU-CS-04-108.ps
CMU-CS-04-108.pdf
Keywords: Algorithmic analysis, data-dependent analysis,
k-nearest neighbor, range queries, adaptive data structures,
high dimensionality, distance distribution
The most commonly used algorithms for spatial data searches such as
k-nearest-neighbor and spherical range queries are based on a class
of data structures we call space-partitioning trees, which have
remained the pragmatic method of choice due to their ability to often
empirically provide sub-linear efficiency in reported dimensionalities
in the tens and occasionally beyond, in constrast to methods designed
for worst-case optimality. Despite long-standing practical interest
in a more realistic runtime analysis of such methods, particularly in
the high-dimensional case demanded by many modern applications, little
further progress has been made since the seminal expected-time
analysis of 1977. One fundamental reason for this is that algorithm
analysis has not, to date, provided examples of analyses which link
algorithmic runtime to probabilistic properties of the input
distribution. This paper introduces some basic statistical machinery
for making this link, and thereby presents initial steps toward
providing a statistically principled framework for
distribution-dependent runtime analysis of space-partitioning-based
algorithms, with an emphasis on providing explanations for their
observed behavior in high-dimensional spaces.
17 pages
|