Some clustering-based exact distribution-free k-sample tests applicable to high dimension, low sample size data
Article Type
Research Article
Publication Title
Journal of Multivariate Analysis
Abstract
Testing homogeneity of k(≥2) multivariate distributions is a challenging problem in statistics, especially when the dimension of the data is much larger than the sample size. Most of the existing tests often perform poorly in this high dimension, low sample size (HDLSS) regime, and many of them cannot be used at all. In this article, we propose some nonparametric tests for this purpose. These tests have the distribution-free property in finite sample situations. They are based on a high dimensional clustering algorithm that makes a partition of the data to form a contingency table. Using the cell frequencies of that table, we construct the test statistics. We can develop tests based on a k-partition of the data or estimate the number of partitions from the data and construct tests based on it. Under appropriate regularity conditions, we prove the consistency of these tests in the HDLSS asymptotic regime. We also consider a multiscale approach, where the results for different number of partitions are aggregated judiciously. Extensive simulation study and analysis of some benchmark datasets illustrate the superiority of the proposed tests over some existing methods.
DOI
10.1016/j.jmva.2021.104897
Publication Date
7-1-2022
Recommended Citation
Paul, Biplab; De, Shyamal K.; and Ghosh, Anil K., "Some clustering-based exact distribution-free k-sample tests applicable to high dimension, low sample size data" (2022). Journal Articles. 3067.
https://digitalcommons.isical.ac.in/journal-articles/3067