Detecting Meaningful Clusters from High-dimensional Data: A Strongly Consistent Sparse Center-based Clustering Approach

Article Type

Research Article

Publication Title

IEEE Transactions on Pattern Analysis and Machine Intelligence


In this paper, we propose a Lasso Weighted k-means ($LW$-k-means) algorithm, as a simple yet efficient sparse clustering procedure for high-dimensional data where the number of features ($p$) can be much higher than the number of observations (n). The $LW$-k-means method imposes an $\ell_1$ regularization term involving the feature weights directly to induce feature selection in a sparse clustering framework. We develop a simple block-coordinate descent type algorithm with time-complexity resembling that of Lloyd's method, to optimize the proposed objective. In addition, we establish the strong consistency of the $LW$-k-means procedure. Such consistency proof is not available for the conventional spare k-means algorithms, in general. $LW$-k-means is tested on a number of synthetic and real-life datasets and through a detailed experimental analysis, we find that the performance of the method is highly competitive against the baselines as well as the state-of-the-art procedures for center-based high-dimensional clustering, not only in terms of clustering accuracy but also with respect to computational time



Publication Date