**
Subquadratic Approximation Algorithms For Clustering Problems
in High Dimensional Spaces
**

*
Allan Borodin, Rafail Ostrovsky and Yuval Rabani
*

**
Abstract:
**

One of the central problems in data-mining, computational biology,
statistical analysis, computer vision, geographic analysis, pattern
recognition, distributed protocols and the web hierarchical search
engines (like Yahoo) is the question of classification of data
according to some clustering rule. Often the data is noisy and even
approximate classification is of extreme importance. The difficulty of
such classification stems from the fact that usually the data has many
incomparable attributes, and often results in the question of
clustering problems in high dimensional spaces. Since they require
measuring distance between every pair of data points, standard
algorithms for computing the exact clustering solutions use quadratic
or "nearly quadratic" running time; i.e. O(d n^{2-a(d)})
where n is the number of data points, d is the dimension of the
space and a(d) approaches 0 as d grows. In this paper, we
show (for three fairly natural clustering rules) that computing an
approximate solution can be done much more efficiently. More
specifically, for the agglomerative clustering (used, for example, in
in the Alta Vista^{2-y} where Y > 0 depends only on the
approximation parameter ∈ and is independent of the dimension
d.

**comment:**
Appeared in
Mahine Learning Journal Special Issue: Theoretical Advances in Data Clustering
(Guest Editors: Nina Mishra and Rajeev Motwani) 56 (1-3): 153-167, 2004

Preliminary version in
* proceedings of
The 31'st ACM Symposium on Theory of Computing (STOC-99)*

Fetch PostScript file of the paper Fetch PDF file of the paper

Back to Publications List |