Clustering High-dimensional Data with Ordered Weighted ℓ1 Regularization

Document Type

Conference Article

Publication Title

Proceedings of Machine Learning Research

Abstract

Clustering complex high-dimensional data is particularly challenging as the signal-to-noise ratio in such data is significantly lower than their classical counterparts. This is mainly because most of the features describing a data point have little to no information about the natural grouping of the data. Filtering such features is, thus, critical in harnessing meaningful information from such large-scale data. Many recent methods have attempted to find feature importance in a centroid-based clustering setting. Though empirically successful in classical low-dimensional settings, most perform poorly, especially on microarray and single-cell RNA-seq data. This paper extends the merits of weighted center-based clustering through the Ordered Weighted ℓ1 (OWL) norm for better feature selection. Appealing to the elegant properties of block coordinate-descent and Frank-Wolf algorithms, we are not only able to maintain computational efficiency but also able to outperform the state-of-the-art in high-dimensional settings. The proposal also comes with finite sample theoretical guarantees, including a rate of O (√k log p/n), under model-sparsity, bridging the gap between theory and practice of weighted clustering.

First Page

7176

Last Page

7189

Publication Date

1-1-2023

This document is currently not available here.

Share

COinS