Detecting Meaningful Clusters From High-Dimensional Data: A Strongly Consistent Sparse Center-Based Clustering Approach

Document Type

Research Article

Publication Title

IEEE Transactions on Pattern Analysis and Machine Intelligence


In context to high-dimensional clustering, the concept of feature weighting has gained considerable importance over the years to capture the relative degrees of importance of different features in revealing the cluster structure of the dataset. However, the popular techniques in this area either fail to perform feature selection or do not preserve the simplicity of Lloyd's heuristic to solve the kk-means problem and the like. In this paper, we propose a Lasso Weighted kk-means (LWLW-kk-means) algorithm, as a simple yet efficient sparse clustering procedure for high-dimensional data where the number of features (pp) can be much higher than the number of observations (nn). The LWLW-kk-means method imposes an ℓ1ℓ1 regularization term involving the feature weights directly to induce feature selection in a sparse clustering framework. We develop a simple block-coordinate descent type algorithm with time-complexity resembling that of Lloyd's method, to optimize the proposed objective. In addition, we establish the strong consistency of the LWLW-kk-means procedure. Such an analysis of the large sample properties is not available for the conventional sparse kk-means algorithms, in general. LWLW-kk-means is tested on a number of synthetic and real-life datasets and through a detailed experimental analysis, we find that the performance of the method is highly competitive against the baselines as well as the state-of-the-art procedures for center-based high-dimensional clustering, not only in terms of clustering accuracy but also with respect to computational time.

First Page


Last Page




Publication Date