Some clustering-based exact distribution-free k-sample tests applicable to high dimension, low sample size data

Article Type

Research Article

Publication Title

Journal of Multivariate Analysis


Testing homogeneity of k(≥2) multivariate distributions is a challenging problem in statistics, especially when the dimension of the data is much larger than the sample size. Most of the existing tests often perform poorly in this high dimension, low sample size (HDLSS) regime, and many of them cannot be used at all. In this article, we propose some nonparametric tests for this purpose. These tests have the distribution-free property in finite sample situations. They are based on a high dimensional clustering algorithm that makes a partition of the data to form a contingency table. Using the cell frequencies of that table, we construct the test statistics. We can develop tests based on a k-partition of the data or estimate the number of partitions from the data and construct tests based on it. Under appropriate regularity conditions, we prove the consistency of these tests in the HDLSS asymptotic regime. We also consider a multiscale approach, where the results for different number of partitions are aggregated judiciously. Extensive simulation study and analysis of some benchmark datasets illustrate the superiority of the proposed tests over some existing methods.



Publication Date


This document is currently not available here.