Random direction divisive clustering
Projection methods for dimension reduction have enabled the discovery of otherwise unattainable structure in ultra high dimensional data. More recently, a particular method, namely Random Projection, has been shown to have the advantage of high quality data representations with minimal computation effort, even for data dimensions in the range of hundreds of thousands or even millions. Here, we couple this dimension reduction technique with data clustering algorithms that are specially designed for high dimensional cases. First, we show that the theoretical properties of both components can be combined in a sound manner, promising an effective clustering framework. Indeed, for a series of simulated and real ultra high dimensional data scenarios, as the experimental analysis shows, the resulting algorithms achieve high quality data partitions, orders of magnitude faster. (C) 2012 Elsevier B.V. All rights reserved.