Entropy-weighted medoid shift: An automated clustering algorithm for high-dimensional data

Article Type

Research Article

Publication Title

Applied Soft Computing

Abstract

Unveiling the intrinsic structure within high-dimensional data presents a significant challenge, particularly when clusters manifest themselves in lower-dimensional subspaces rather than in the full feature space. This complexity is prevalent in real-world datasets, such as text documents and images, which often contain numerous noisy or sparse features. Traditional clustering methods often overlook these latent subspace structures. This paper introduces a novel subspace-based clustering algorithm designed explicitly to address this challenge. Building upon the robust medoid shift framework, we integrate a dimensionality reduction scheme that dynamically projects data onto evolving subspaces determined through entropy-constrained optimization. This approach effectively filters irrelevant information and identifies underlying clusters, optimizing subspace representation while avoiding trivial solutions. Unlike existing methods, our algorithm ensures convergence without necessitating stopping criteria, thereby enabling efficient processing of large datasets. We validate the efficacy of our approach through extensive experiments on synthetic and real-world datasets, demonstrating substantial performance enhancements over state-of-the-art techniques. By explicitly uncovering the underlying subspace structures, our method opens new avenues for effective high-dimensional data clustering and offers valuable insights into complex data environments.

DOI

10.1016/j.asoc.2024.112347

Publication Date

1-1-2025

Share

COinS