Clustering High-dimensional Data with Ordered Weighted ℓ1 Regularization
Document Type
Conference Article
Publication Title
Proceedings of Machine Learning Research
Abstract
Clustering complex high-dimensional data is particularly challenging as the signal-to-noise ratio in such data is significantly lower than their classical counterparts. This is mainly because most of the features describing a data point have little to no information about the natural grouping of the data. Filtering such features is, thus, critical in harnessing meaningful information from such large-scale data. Many recent methods have attempted to find feature importance in a centroid-based clustering setting. Though empirically successful in classical low-dimensional settings, most perform poorly, especially on microarray and single-cell RNA-seq data. This paper extends the merits of weighted center-based clustering through the Ordered Weighted ℓ1 (OWL) norm for better feature selection. Appealing to the elegant properties of block coordinate-descent and Frank-Wolf algorithms, we are not only able to maintain computational efficiency but also able to outperform the state-of-the-art in high-dimensional settings. The proposal also comes with finite sample theoretical guarantees, including a rate of O (√k log p/n), under model-sparsity, bridging the gap between theory and practice of weighted clustering.
First Page
7176
Last Page
7189
Publication Date
1-1-2023
Recommended Citation
Chakraborty, Chandramauli; Paul, Sayan; Chakraborty, Saptarshi; and Das, Swagatam, "Clustering High-dimensional Data with Ordered Weighted ℓ1 Regularization" (2023). Conference Articles. 588.
https://digitalcommons.isical.ac.in/conf-articles/588