K−Means clustering with a new divergence-based distance metric: Convergence and performance analysis
Pattern Recognition Letters
The choice of a proper similarity/dissimilarity measure is very important in cluster analysis for revealing the natural grouping in a given dataset. Choosing the most appropriate measure has been an open problem for many years in cluster analysis. Among various approaches of incorporating a non-Euclidean dissimilarity measure for clustering, use of the divergence-based distance functions has recently gained attention in the perspective of partitional clustering. Following this direction, we propose a new point-to-point distance measure called the S−distance motivated from the recently developed S-divergence measure (originally defined on the open cone of positive definite matrices) and discuss some of its important properties. We subsequently develop the S−k−means algorithm (with Lloyd's heuristic) which replaces the conventional Euclidean distance of k−means with the S−distance. We also provide a theoretical analysis of the S−k−means algorithm establishing the convergence of the obtained partial optimal solutions to a locally optimal solution. The performance of S−k−means is compared with the classical k−means algorithm with Euclidean distance metric and its feature-weighted variants using several synthetic and real-life datasets. The comparative study indicates that our results are appealing, especially when the distribution of the clusters is not regular.
Chakraborty, Saptarshi and Das, Swagatam, "K−Means clustering with a new divergence-based distance metric: Convergence and performance analysis" (2017). Journal Articles. 2324.