Journal Articles

On Perfect Clustering of High Dimension, Low Sample Size Data

Soham Sarkar, Ecole Polytechnique Fédérale de Lausanne
Anil K. Ghosh, Indian Statistical Institute, Kolkata

Article Type

Research Article

Publication Title

IEEE Transactions on Pattern Analysis and Machine Intelligence

Abstract

Popular clustering algorithms based on usual distance functions (e.g., the Euclidean distance) often suffer in high dimension, low sample size (HDLSS) situations, where concentration of pairwise distances and violation of neighborhood structure have adverse effects on their performance. In this article, we use a new data-driven dissimilarity measure, called MADD, which takes care of these problems. MADD uses the distance concentration phenomenon to its advantage, and as a result, clustering algorithms based on MADD usually perform well for high dimensional data. We establish it using theoretical as well as numerical studies. We also address the problem of estimating the number of clusters. This is a challenging problem in cluster analysis, and several algorithms are available for it. We show that many of these existing algorithms have superior performance in high dimensions when they are constructed using MADD. We also construct a new estimator based on a penalized version of the Dunn index and prove its consistency in the HDLSS asymptotic regime. Several simulated and real data sets are analyzed to demonstrate the usefulness of MADD for cluster analysis of high dimensional data.

First Page

2257

Last Page

2272

DOI

10.1109/TPAMI.2019.2912599

Publication Date

9-1-2020

Comments

Open Access, Green

Recommended Citation

Sarkar, Soham and Ghosh, Anil K., "On Perfect Clustering of High Dimension, Low Sample Size Data" (2020). Journal Articles. 147.
https://digitalcommons.isical.ac.in/journal-articles/147

This document is currently not available here.

COinS

Journal Articles

On Perfect Clustering of High Dimension, Low Sample Size Data

Article Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Comments

Recommended Citation

Browse

Search

Author Corner

Links

Journal Articles

On Perfect Clustering of High Dimension, Low Sample Size Data

Authors

Article Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Comments

Recommended Citation

Share

Browse

Search

Author Corner

Links